Efficient Visual Prompt Engineering for Creative Story Writing
DOI:
https://doi.org/10.47611/jsrhs.v13i4.7705Keywords:
Large Language Models, Visual Prompt Engineering, Story Generation, Artificial Intelligence, Diffusion Models, Computer ScienceAbstract
Large Language Models (LLMs) are extensively utilized for generating stories, showcasing their ability to handle complex, creative tasks. To begin the process of story generation, an initial textual prompt is required. The prompt is iteratively refined such that the discrepancy between the user’s expectations and the story generated from the prompt is minimized. Each iteration is a time-consuming process; the user needs to read and analyze the story in order to refine the prompt. A key insight from cognitive research suggests that analyzing visual data is 60,000 times faster than textual analysis. This paper proposes visual prompt engineering for story generation wherein textual prompts are transformed into images using a diffusion model, then refined based on the discrepancy between the user’s expectations and the generated image. This refined prompt is then used to generate a story. The entire process is repeated until the user is satisfied with the story. This method leverages the relative speed of image processing to enhance the quality of text generation per iteration. Experiments show that for the same number of iterations, stories generated by visual prompt engineering outperformed those generated by text-based prompts in terms of story quality.
Downloads
References or Bibliography
Babcock, L. (2024, August 6). How long proofreading takes (a complete guide with tables). Om Proofreading. https://omproofreading.com/how-long-proofreading-takes/
Carver, R. P. (1990). Reading Rate: A Review of Research and Theory. Academic Press.
Hosni, Y. (2024, February 20). Prompt engineering best practices: Iterative prompt development. Medium. https://pub.towardsai.net/prompt-engineering-best-practices-iterative-prompt-development-22759b309919?gi=e0a65467b4ea
Kincaid, J. P., & Delionbach, L. J. (1973). Validation of the Automated Readability Index: A Follow-Up. Human Factors, 15(1), 17-20. https://doi.org/10.1177/001872087301500103
Levi, E., Brosh, E., & Friedmann, M. (2024, February 5). Intent-based prompt calibration: Enhancing prompt optimization with synthetic boundary cases. arXiv.org. https://arxiv.org/abs/2402.03099. https://doi.org/10.48550/arXiv.2402.03099
Pamplona, F. (2023, September 22). Column - the power of visuals. MedTech Intelligence. https://medtechintelligence.com/column/the-power-of-visuals/#:~:text=The%20human%20brain%20processes%20images,also%20more%20memorable%20than%20text
Singh, C., Morris, J. X., Aneja, J., Rush, A. M., & Gao, J. (2023, January 26). Explaining patterns in data with language models via interpretable autoprompting. arXiv.org. https://arxiv.org/abs/2210.01848. https://doi.org/10.48550/arXiv.2210.01848
Tuscher, M., & Schmidt, J. (2022, October 16). Processing speed and comprehensibility of visualizations. VRVis Forschungs-GmbH. https://www.vrvis.at/publications/pdfs/PB-VRVis-2022-016.pdf
Wang, B., Deng, X., & Sun, H. (2022, October 23). Iteratively prompt pre-trained language models for chain of thought. arXiv.org. https://arxiv.org/abs/2203.08383. https://doi.org/10.48550/arXiv.2203.08383
Published
How to Cite
Issue
Section
Copyright (c) 2024 Felix Deng

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.


