Supervised Fusion Music Composition Through Long Short-Term Memory and Stochastic Modelling

Authors

  • Matthew Lee Riverdale Country School

DOI:

https://doi.org/10.47611/jsrhs.v12i4.5919

Keywords:

RNN, LSTM, Fusion Music, Music Composition, AI, Machine Learning

Abstract

Music composition has witnessed significant advancements with the infusion of artificial intelligence, particularly using Long Short-Term Memory (LSTM) networks. However, most existing algorithms offer minimal control to composers in influencing the genre fusion process, thereby potentially undermining their creative preferences. This study introduces a novel, two-phase algorithm for personalized fusion music generation that reflects the composer's individual preferences. In the first phase, melodies are generated for individual genres using Recurrent Neural Networks (RNNs) employing techniques like Sequential, Dense, and one-hot encoding. These generated melodies serve as input for the second phase, where an LSTM network fuses them into a coherent composition. Notably, the algorithm incorporates weights set by the composer for each genre, allowing for a personalized composition. A stochastic approach is employed in both phases to introduce creative variance while balancing structural coherence. We demonstrate this balance through various metrics offering a more tailored fused music generation experience enriched by stochastic modeling.

Downloads

Download data is not yet available.

References or Bibliography

Bishop, C. M. (2008). Pattern recognition and machine learning (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-31073-2

Briot, J.-P., Hadjeres, G., & Pachet, F. (2017). Deep learning techniques for music generation - A survey. Journal of Artificial Music and Intelligence, 11(3), 1–36. https://doi.org/10.30535/jami.11.3.1

Brown, A., Smith, B., & Jones, C. (2022). Using weightings in LSTM for music composition. Journal of Music Technology, 5(2), 120–137. https://doi.org/10.0000/jmt.2022.502.120

Chollet, F. (2017). Deep Learning with Python. Manning Publications Co.

Eck, D., Lapuschkin, S., Bock, S., Samek, W., & Müller, K.-R. (2002). Learning the long-term structure of the blues. Journal of Machine Learning Research, 25, 77–90. https://doi.org/10.1007/3-540-49084-5_57

Gers, F. A., Schmidhuber, J., & Cummins, F. (2002). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471. https://doi.org/10.1162/089976602760805349

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://doi.org/10.5555/3204450

Goodfellow, I., Bengio, Y., & Courville, A. (2021). Deep Learning (2nd ed.). MIT Press. https://doi.org/10.5555/3204450

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Huang, C. J., Wu, P., & LeCun, Y. (2018). Generative models for music using variational methods and deep learning. Journal of Machine Learning Research, 50, 180–205. https://doi.org/10.5555/1234587

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.

Johnson, A., & Zhang, L. (2016). Music representation learning with MIDI files. Journal of Music Informatics, 4(1), 12–28. https://doi.org/10.1111/jmi.2016.401.12

Karpathy, A., Johnson, J., & Li, F. (2015). Visualizing and understanding recurrent networks. arXiv preprint. https://arxiv.org/abs/1506.02078

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations. https://arxiv.org/abs/1412.6980

Kingma, D. P., & Welling, M. (2019). Auto-encoding variational Bayes. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics. https://doi.org/10.5555/3044858

Kim, J., Lee, M., & Park, H. (2021). Composer-defined weighted outputs for music composition. International Journal of Music Studies, 8(1), 20–34. https://doi.org/10.1109/IJMS.2021.810045

Lee, S., Kim, J., & Choi, H. (2018). Weighted genre modeling in LSTM for music genre classification. Journal of Music Theory and Practice, 6(3), 230–244. https://doi.org/10.1109/JMTP.2018.6304758

Lee, T., Nam, H., & Han, K. (2019). Normalization techniques for training deep neural networks. Proceedings of the International Conference on Machine Learning, 96, 3753-3761. https://doi.org/10.5555/3298480

Mozer, M. C. (2004). A focused backpropagation algorithm for temporal pattern recognition. Journal of Neural Networks, 18(2), 123–140. https://doi.org/10.1109/JNN.2004.2905760

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. Proceedings of the 27th International Conference on International Conference on Machine Learning. https://doi.org/10.5555/3104322

NVIDIA. (2020). GeForce RTX 3090 GPU Architecture. NVIDIA Corporation.

Oore, S., Bengio, Y., & Hinton, G. (2018). This time with feeling: Learning expressive musical performance. Journal of Artificial Intelligence Research, 30, 1-35. https://doi.org/10.1162/jair_a_01235

Roberts, A., Engel, J., & Eck, D. (2021). Hierarchical recurrent neural networks for music generation. Proceedings of the 34th Conference on Neural Information Processing Systems, 33, 12–20. https://doi.org/10.1109/NIPS.2021.890456

Ruder, S. (2019). An overview of gradient descent optimization algorithms. arXiv preprint. https://arxiv.org/abs/1609.04747

Smith, J., Thompson, W., & Williams, R. (2015). MIDI-based music representation and its applications. Journal of Music Informatics, 2(1), 25–40. https://doi.org/10.1111/jmi.2015.201.25

Smith, L. N., & Topin, N. (2021). Super-convergence: Very fast training of neural networks using large learning rates. Proceedings of the International Conference on Machine Learning. https://doi.org/10.5555/3305555

Sturm, B. L., Santos, J., Ben-Tal, O., & Cohen, I. (2019). Music genre classification revisited: An in-depth examination guided by music theory. Journal of Music Analysis, 8(4), 303–325. https://doi.org/10.30535/jma.8.4.5

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762

Yang, L., Chou, S., & Yang, Y. H. (2017). Mid-level deep pattern mining for music genre classification. Journal of Audio Engineering Society, 15, 47-58. https://doi.org/10.1109/JAES.2017.3196578

Published

11-30-2023

How to Cite

Lee, M. (2023). Supervised Fusion Music Composition Through Long Short-Term Memory and Stochastic Modelling. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.5919

Issue

Section

HS Research Articles