Supervised Fusion Music Composition Through Long Short-Term Memory and Stochastic Modelling


  • Matthew Lee Riverdale Country School



RNN, LSTM, Fusion Music, Music Composition, AI, Machine Learning


Music composition has witnessed significant advancements with the infusion of artificial intelligence, particularly using Long Short-Term Memory (LSTM) networks. However, most existing algorithms offer minimal control to composers in influencing the genre fusion process, thereby potentially undermining their creative preferences. This study introduces a novel, two-phase algorithm for personalized fusion music generation that reflects the composer's individual preferences. In the first phase, melodies are generated for individual genres using Recurrent Neural Networks (RNNs) employing techniques like Sequential, Dense, and one-hot encoding. These generated melodies serve as input for the second phase, where an LSTM network fuses them into a coherent composition. Notably, the algorithm incorporates weights set by the composer for each genre, allowing for a personalized composition. A stochastic approach is employed in both phases to introduce creative variance while balancing structural coherence. We demonstrate this balance through various metrics offering a more tailored fused music generation experience enriched by stochastic modeling.


Download data is not yet available.

References or Bibliography

Bishop, C. M. (2008). Pattern recognition and machine learning (2nd ed.). Springer.

Briot, J.-P., Hadjeres, G., & Pachet, F. (2017). Deep learning techniques for music generation - A survey. Journal of Artificial Music and Intelligence, 11(3), 1–36.

Brown, A., Smith, B., & Jones, C. (2022). Using weightings in LSTM for music composition. Journal of Music Technology, 5(2), 120–137.

Chollet, F. (2017). Deep Learning with Python. Manning Publications Co.

Eck, D., Lapuschkin, S., Bock, S., Samek, W., & Müller, K.-R. (2002). Learning the long-term structure of the blues. Journal of Machine Learning Research, 25, 77–90.

Gers, F. A., Schmidhuber, J., & Cummins, F. (2002). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

Goodfellow, I., Bengio, Y., & Courville, A. (2021). Deep Learning (2nd ed.). MIT Press.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

Huang, C. J., Wu, P., & LeCun, Y. (2018). Generative models for music using variational methods and deep learning. Journal of Machine Learning Research, 50, 180–205.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.

Johnson, A., & Zhang, L. (2016). Music representation learning with MIDI files. Journal of Music Informatics, 4(1), 12–28.

Karpathy, A., Johnson, J., & Li, F. (2015). Visualizing and understanding recurrent networks. arXiv preprint.

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations.

Kingma, D. P., & Welling, M. (2019). Auto-encoding variational Bayes. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics.

Kim, J., Lee, M., & Park, H. (2021). Composer-defined weighted outputs for music composition. International Journal of Music Studies, 8(1), 20–34.

Lee, S., Kim, J., & Choi, H. (2018). Weighted genre modeling in LSTM for music genre classification. Journal of Music Theory and Practice, 6(3), 230–244.

Lee, T., Nam, H., & Han, K. (2019). Normalization techniques for training deep neural networks. Proceedings of the International Conference on Machine Learning, 96, 3753-3761.

Mozer, M. C. (2004). A focused backpropagation algorithm for temporal pattern recognition. Journal of Neural Networks, 18(2), 123–140.

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. Proceedings of the 27th International Conference on International Conference on Machine Learning.

NVIDIA. (2020). GeForce RTX 3090 GPU Architecture. NVIDIA Corporation.

Oore, S., Bengio, Y., & Hinton, G. (2018). This time with feeling: Learning expressive musical performance. Journal of Artificial Intelligence Research, 30, 1-35.

Roberts, A., Engel, J., & Eck, D. (2021). Hierarchical recurrent neural networks for music generation. Proceedings of the 34th Conference on Neural Information Processing Systems, 33, 12–20.

Ruder, S. (2019). An overview of gradient descent optimization algorithms. arXiv preprint.

Smith, J., Thompson, W., & Williams, R. (2015). MIDI-based music representation and its applications. Journal of Music Informatics, 2(1), 25–40.

Smith, L. N., & Topin, N. (2021). Super-convergence: Very fast training of neural networks using large learning rates. Proceedings of the International Conference on Machine Learning.

Sturm, B. L., Santos, J., Ben-Tal, O., & Cohen, I. (2019). Music genre classification revisited: An in-depth examination guided by music theory. Journal of Music Analysis, 8(4), 303–325.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

Yang, L., Chou, S., & Yang, Y. H. (2017). Mid-level deep pattern mining for music genre classification. Journal of Audio Engineering Society, 15, 47-58.



How to Cite

Lee, M. (2023). Supervised Fusion Music Composition Through Long Short-Term Memory and Stochastic Modelling. Journal of Student Research, 12(4).



HS Research Articles