Efficient Visual Prompt Engineering for Creative Story Writing

Authors

  • Felix Deng Redmond High School

DOI:

https://doi.org/10.47611/jsrhs.v13i4.7705

Keywords:

Large Language Models, Visual Prompt Engineering, Story Generation, Artificial Intelligence, Diffusion Models, Computer Science

Abstract

Large Language Models (LLMs) are extensively utilized for generating stories, showcasing their ability to handle complex, creative tasks. To begin the process of story generation, an initial textual prompt is required. The prompt is iteratively refined such that the discrepancy between the user’s expectations and the story generated from the prompt is minimized. Each iteration is a time-consuming process; the user needs to read and analyze the story in order to refine the prompt. A key insight from cognitive research suggests that analyzing visual data is 60,000 times faster than textual analysis. This paper proposes visual prompt engineering for story generation wherein textual prompts are transformed into images using a diffusion model, then refined based on the discrepancy between the user’s expectations and the generated image. This refined prompt is then used to generate a story. The entire process is repeated until the user is satisfied with the story. This method leverages the relative speed of image processing to enhance the quality of text generation per iteration. Experiments show that for the same number of iterations, stories generated by visual prompt engineering outperformed those generated by text-based prompts in terms of story quality.

Downloads

Download data is not yet available.

References or Bibliography

Babcock, L. (2024, August 6). How long proofreading takes (a complete guide with tables). Om Proofreading. https://omproofreading.com/how-long-proofreading-takes/

Carver, R. P. (1990). Reading Rate: A Review of Research and Theory. Academic Press.

Hosni, Y. (2024, February 20). Prompt engineering best practices: Iterative prompt development. Medium. https://pub.towardsai.net/prompt-engineering-best-practices-iterative-prompt-development-22759b309919?gi=e0a65467b4ea

Kincaid, J. P., & Delionbach, L. J. (1973). Validation of the Automated Readability Index: A Follow-Up. Human Factors, 15(1), 17-20. https://doi.org/10.1177/001872087301500103

Levi, E., Brosh, E., & Friedmann, M. (2024, February 5). Intent-based prompt calibration: Enhancing prompt optimization with synthetic boundary cases. arXiv.org. https://arxiv.org/abs/2402.03099. https://doi.org/10.48550/arXiv.2402.03099

Pamplona, F. (2023, September 22). Column - the power of visuals. MedTech Intelligence. https://medtechintelligence.com/column/the-power-of-visuals/#:~:text=The%20human%20brain%20processes%20images,also%20more%20memorable%20than%20text

Singh, C., Morris, J. X., Aneja, J., Rush, A. M., & Gao, J. (2023, January 26). Explaining patterns in data with language models via interpretable autoprompting. arXiv.org. https://arxiv.org/abs/2210.01848. https://doi.org/10.48550/arXiv.2210.01848

Tuscher, M., & Schmidt, J. (2022, October 16). Processing speed and comprehensibility of visualizations. VRVis Forschungs-GmbH. https://www.vrvis.at/publications/pdfs/PB-VRVis-2022-016.pdf

Wang, B., Deng, X., & Sun, H. (2022, October 23). Iteratively prompt pre-trained language models for chain of thought. arXiv.org. https://arxiv.org/abs/2203.08383. https://doi.org/10.48550/arXiv.2203.08383

Published

11-30-2024

How to Cite

Deng, F. (2024). Efficient Visual Prompt Engineering for Creative Story Writing. Journal of Student Research, 13(4). https://doi.org/10.47611/jsrhs.v13i4.7705

Issue

Section

HS Research Projects