MUsE: Model of User-friendliness and Efficiency
DOI:
https://doi.org/10.47611/jsrhs.v13i4.8071Keywords:
Large Language Models (LLMs), Transformer architecture, Machine LearningAbstract
At present, the sizes of large language models (LLMs) continue to increase as the number of trainable parameters increases. However, increased sizes make users pay higher costs for using LLMs, while the models are not user-friendly enough. Consequently, for socially vulnerable populations, such as children and older adults, the language models may be less accessible, and their responses may not be easy to understand. Therefore, we created a new LLM, termed the Model of User-friendliness and Efficiency (MUsE), and hypothesized that MUsE would surpass existing closed-source LLMs in user-friendliness and efficiency by facilitating conversations between LLMs and users based on the user type. MUsE leverages the Transformer architecture with light model size and low-rank adaptation (LoRA), which augments its efficiency by updating weights with lower dimensions. MUsE then undergoes user-friendly instruction tuning with a custom dataset created by a closed-source LLM. Through this process, it is efficiently trained and can outperform other LLMs in understandability, geniality, and applicability, contributing to better user-friendliness despite its small size. Code is available in https://github.com/seohyunqkr/MUsE.
Downloads
References or Bibliography
Devlin, Jacob, et al. (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." arXiv preprint arXiv:1810.04805, 2018. https://doi.org/10.48550/arXiv.1810.04805.
Brown, Tom, et al. (2020). "Language Models are Few-Shot Learners." Advances in Neural Information Processing Systems, 33. https://doi.org/10.48550/arXiv.2005.14165.
OpenAI. (2023). "GPT-4 Technical Report." arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774.
Chowdhery, Aakanksha, et al. (2023). "PaLM: Scaling Language Modeling with Pathways." Journal of Machine Learning Research, 24(240). https://doi.org/10.48550/arXiv.2204.02311.
Touvron, Hugo, et al. (2023). "LLaMA: Open and Efficient Foundation Language Models." arXiv preprint arXiv:2302.13971. https://doi.org/10.48550/arXiv.2302.13971.
Naveed, Humza, et al. (2023) "A Comprehensive Overview of Large Language Models." arXiv preprint arXiv:2307.06435. https://doi.org/10.48550/arXiv.2307.06435.
Peng, Baolin, et al. (2023). "Instruction Tuning with GPT-4." arXiv preprint arXiv:2304.03277. https://doi.org/10.48550/arXiv.2304.03277.
IBM. (2024). What Is Instruction Tuning. https://www.ibm.com/topics/instruction-tuning.
Wu, Chenxi, et al. (2024). "GPT vs Human for Scientific Reviews: A Dual Source Review on Applications of ChatGPT in Science." Journal of Machine Learning for Modeling and Computing, 5(2). https://doi.org/10.48550/arXiv.2312.03769.
Cai, Zhenguang G., et al. (2024). “Do Large Language Models Resemble Humans in Language Use?” arXiv preprint arXiv:2303.08014v2. https://doi.org/10.48550/arXiv.2303.08014.
Sanh, Victor, et al. (2019). "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108.
Vaswani, Ashish, et al. (2017). "Attention is All You Need." Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.03762.
Hu, Edward J., et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv preprint arXiv:2106.09685. https://doi.org/10.48550/arXiv.2106.09685.
Abdin, Marah, et al. (2024). "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone." arXiv preprint arXiv:2404.14219. https://doi.org/10.48550/arXiv.2404.14219.
Anthropic. (2024). Learn about Claude Models., https://docs.anthropic.com/en/docs/about-claude/models.
Kalamkar, Dhiraj, et al. (2019). "A Study of BFLOAT16 for Deep Learning Training." arXiv preprint arXiv:1905.12322. https://doi.org/10.48550/arXiv.1905.12322.
Dao, Tri. (2023). "FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning." arXiv preprint arXiv:2307.08691. https://doi.org/10.48550/arXiv.2307.08691.
Loshchilov, Ilya, et al. (2017). "Decoupled Weight Decay Regularization." arXiv preprint arXiv:1711.05101. https://doi.org/10.48550/arXiv.1711.05101.
Loshchilov, Ilya, et al. (2016). “SGDR: Stochastic Gradient Descent with Warm Restarts." arXiv preprint arXiv:1608.03983. https://doi.org/10.48550/arXiv.1608.03983.
OpenAI. (2024). Hello gpt-4o. openai.com/index/hello-gpt-4o/.
Gemini Team Google. (2023). "Gemini: A Family of Highly Capable Multimodal Models." arXiv preprint arXiv:2312.11805. https://doi.org/10.48550/arXiv.2312.11805.
SemiAnalysis. (2024). GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE. www.semianalysis.com/p/gpt-4-architecture-infrastructure.
Published
How to Cite
Issue
Section
Copyright (c) 2024 Seohyun Park; Byung-Kwan Lee

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.


