Kumar, Satyajit, and Ehsan Adeli. “A Picture Is Worth a Thousand Words: Using Cross-Modal Transformers and Variational AutoEncoders to Generate Images from Text”. Journal of Student Research 10, no. 4 (November 30, 2021). Accessed May 1, 2024. https://www.jsr.org/hs/index.php/path/article/view/2106.