MUsE: Model of User-friendliness and Efficiency

Seohyun Park; Byung-Kwan Lee

doi:10.47611/jsrhs.v13i4.8071

Authors

Seohyun Park Seoul International School
Byung-Kwan Lee Korea Advanced Institute of Science & Technology

DOI:

https://doi.org/10.47611/jsrhs.v13i4.8071

Keywords:

Large Language Models (LLMs), Transformer architecture, Machine Learning

PDF

Abstract

At present, the sizes of large language models (LLMs) continue to increase as the number of trainable parameters increases. However, increased sizes make users pay higher costs for using LLMs, while the models are not user-friendly enough. Consequently, for socially vulnerable populations, such as children and older adults, the language models may be less accessible, and their responses may not be easy to understand. Therefore, we created a new LLM, termed the Model of User-friendliness and Efficiency (MUsE), and hypothesized that MUsE would surpass existing closed-source LLMs in user-friendliness and efficiency by facilitating conversations between LLMs and users based on the user type. MUsE leverages the Transformer architecture with light model size and low-rank adaptation (LoRA), which augments its efficiency by updating weights with lower dimensions. MUsE then undergoes user-friendly instruction tuning with a custom dataset created by a closed-source LLM. Through this process, it is efficiently trained and can outperform other LLMs in understandability, geniality, and applicability, contributing to better user-friendliness despite its small size. Code is available in https://github.com/seohyunqkr/MUsE.

Downloads

Download data is not yet available.

References or Bibliography

Devlin, Jacob, et al. (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." arXiv preprint arXiv:1810.04805, 2018. https://doi.org/10.48550/arXiv.1810.04805.

Brown, Tom, et al. (2020). "Language Models are Few-Shot Learners." Advances in Neural Information Processing Systems, 33. https://doi.org/10.48550/arXiv.2005.14165.

OpenAI. (2023). "GPT-4 Technical Report." arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774.

Chowdhery, Aakanksha, et al. (2023). "PaLM: Scaling Language Modeling with Pathways." Journal of Machine Learning Research, 24(240). https://doi.org/10.48550/arXiv.2204.02311.

Touvron, Hugo, et al. (2023). "LLaMA: Open and Efficient Foundation Language Models." arXiv preprint arXiv:2302.13971. https://doi.org/10.48550/arXiv.2302.13971.

Naveed, Humza, et al. (2023) "A Comprehensive Overview of Large Language Models." arXiv preprint arXiv:2307.06435. https://doi.org/10.48550/arXiv.2307.06435.

Peng, Baolin, et al. (2023). "Instruction Tuning with GPT-4." arXiv preprint arXiv:2304.03277. https://doi.org/10.48550/arXiv.2304.03277.

IBM. (2024). What Is Instruction Tuning. https://www.ibm.com/topics/instruction-tuning.

Wu, Chenxi, et al. (2024). "GPT vs Human for Scientific Reviews: A Dual Source Review on Applications of ChatGPT in Science." Journal of Machine Learning for Modeling and Computing, 5(2). https://doi.org/10.48550/arXiv.2312.03769.

Cai, Zhenguang G., et al. (2024). “Do Large Language Models Resemble Humans in Language Use?” arXiv preprint arXiv:2303.08014v2. https://doi.org/10.48550/arXiv.2303.08014.

Sanh, Victor, et al. (2019). "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108.

Vaswani, Ashish, et al. (2017). "Attention is All You Need." Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.03762.

Hu, Edward J., et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv preprint arXiv:2106.09685. https://doi.org/10.48550/arXiv.2106.09685.

Abdin, Marah, et al. (2024). "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone." arXiv preprint arXiv:2404.14219. https://doi.org/10.48550/arXiv.2404.14219.

Anthropic. (2024). Learn about Claude Models., https://docs.anthropic.com/en/docs/about-claude/models.

Kalamkar, Dhiraj, et al. (2019). "A Study of BFLOAT16 for Deep Learning Training." arXiv preprint arXiv:1905.12322. https://doi.org/10.48550/arXiv.1905.12322.

Dao, Tri. (2023). "FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning." arXiv preprint arXiv:2307.08691. https://doi.org/10.48550/arXiv.2307.08691.

Loshchilov, Ilya, et al. (2017). "Decoupled Weight Decay Regularization." arXiv preprint arXiv:1711.05101. https://doi.org/10.48550/arXiv.1711.05101.

Loshchilov, Ilya, et al. (2016). “SGDR: Stochastic Gradient Descent with Warm Restarts." arXiv preprint arXiv:1608.03983. https://doi.org/10.48550/arXiv.1608.03983.

OpenAI. (2024). Hello gpt-4o. openai.com/index/hello-gpt-4o/.

Gemini Team Google. (2023). "Gemini: A Family of Highly Capable Multimodal Models." arXiv preprint arXiv:2312.11805. https://doi.org/10.48550/arXiv.2312.11805.

SemiAnalysis. (2024). GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE. www.semianalysis.com/p/gpt-4-architecture-infrastructure.

MUsE: Model of User-friendliness and Efficiency

Authors

DOI:

Keywords:

Abstract

Downloads

References or Bibliography

Published

How to Cite

Issue

Section

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE

MUsE: Model of User-friendliness and Efficiency

Authors

DOI:

Keywords:

Abstract

Downloads

References or Bibliography

Published

How to Cite

Issue

Section

ARTICLESPUBLISHED

STUDENTAUTHORS

YEARSOF SERVICE

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE