A Systematic Survey on Large Language Models for Code Generation
DOI:
https://doi.org/10.14500/aro.12159Keywords:
Benchmarking, Code Generation, Evaluation Metrics, Large Language ModelsAbstract
The rapid development of large language models (LLMs) has transformed code generation, offering powerful tools for automating software development tasks. However, evaluating generated code’s quality, security, and effectiveness remains a significant challenge. The present systematic survey comprehensively analyses studies published between 2021 and 2024, focusing on utilizing LLMs in the code generation process. The survey explored ten research questions, such as the most commonly used programming languages, the metrics employed to evaluate the quality of code, and scenarios in which LLMs are applied by developers during the software development process, outlining the scope in which prompt engineering influences code generation and security concerns with the types of benchmarks, models evaluated, and code analysis tools used in studies. The findings indicate that the most frequently used evaluation metrics in code generation are Pass@k and Bilingual Evaluation Understudy. It also shows that Python, Java, and C++ are the most widely used languages. Furthermore, identifying security vulnerabilities and establishing robust evaluation metrics remain challenges. This survey underlines present practices, detects gaps, and suggests future research to enhance the reliability and security of code generated by LLMs in real-world applications.
Downloads
References
Afsharmazayejani, R., Shahmiri, M.M., Link, P., Pearce, H., and Tan, B., 2024. Toward Hardware Security Benchmarking of LLMs. In: 2024 IEEE LLM Aided Design Workshop, LAD 2024. Institute of Electrical and Electronics Engineers Inc. DOI: https://doi.org/10.1109/LAD62341.2024.10691745
Aggarwal, P., Chatterjee, O., Dai, T., Mohapatra, P., Paulovicks, B., Blancett, B., and De Magalhaes, A., 2024. CodeSift: An LLM-Based Reference-Less Framework for Automatic Code Validation. In: IEEE International Conference on Cloud Computing, CLOUD. IEEE Computer Society, pp.404-410. DOI: https://doi.org/10.1109/CLOUD62652.2024.00052
Al-Khafaji, N.J., and Majeed, B.K., 2024. Evaluating Large Language Models using Arabic Prompts to Generate Python Codes. In: 4th International Conference on Emerging Smart Technologies and Applications, eSmarTA 2024. Institute of Electrical and Electronics Engineers Inc. DOI: https://doi.org/10.1109/eSmarTA62850.2024.10638877
Beurer-Kellner, L., Vechev, M., and Fischer, M., 2023. Prompting is programming: A query language for large language models. Proceedings of the ACM on Programming Languages, 7, pp. 1946-1969. DOI: https://doi.org/10.1145/3591300
Black, G.S., Rimal, B.P., and Vaidyan, V.M., 2024. Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models. IEEE Transactions on Emerging Topics in Computational Intelligence, pp.1-12. DOI: https://doi.org/10.1109/TETCI.2024.3446695
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D., 2020. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20. Curran Associates Inc., Red Hook, NY, USA.
Bucaioni, A., Ekedahl, H., Helander, V., and Nguyen, P.T., 2024. Programming with ChatGPT: How far can we go? Machine Learning with Applications, 15, p.100526. DOI: https://doi.org/10.1016/j.mlwa.2024.100526
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Yi Chang, Zhang, Y., Yu, P.S., Yang, Q., and Xie, X., 2024. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), p.39. DOI: https://doi.org/10.1145/3641289
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F.P., Cummings, D., Plappert, M., Chantzis, F., Barnes, E., Herbert-Voss, A., Guss, W.H., Nichol, A., Paino, A., Tezak, N., Tang, J., Babuschkin, I., Balaji, S., Jain, S., Saunders, W., Hesse, C., Carr, A.N., Leike, J., Achiam, J., Misra, V., Morikawa, E., Radford, A., Knight, M., Brundage, M., Murati, M., Mayer, K., Welinder, P., McGrew, B., Amodei, D., McCandlish, S., Sutskever, I., and Zaremba, W., 2021. Evaluating Large Language Models Trained on Code. Available from: https://arxiv.org/abs/2107.03374 [Last accessed on 2024 Dec 27].
Chowdhury, M.N.U.R., and Haque, A., 2023. ChatGPT: Its Applications and Limitations. In: 2023 3rd International Conference on Intelligent Technologies, CONIT 2023. Institute of Electrical and Electronics Engineers Inc. DOI: https://doi.org/10.1109/CONIT59222.2023.10205621
Clark, A., Igbokwe, D., Ross, S., and Zibran, M.F., 2024. A Quantitative Analysis of Quality and Consistency in AI-generated Code. In: Proceedings - 2024 7th International Conference on Software and System Engineering, ICoSSE 2024. Institute of Electrical and Electronics Engineers Inc., pp.37-41. DOI: https://doi.org/10.1109/ICoSSE62619.2024.00014
Corso, V., Mariani, L., Micucci, D., and Riganelli, O., 2024. Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants. In: IEEE International Conference on Program Comprehension. IEEE Computer Society, pp.13-23. DOI: https://doi.org/10.1145/3643916.3644402
Cotroneo, D., Foggia, A., Improta, C., Liguori, P., and Natella, R., 2024. Automating the correctness assessment of AI-generated code for security contexts. Journal of Systems and Software, 216, p.112113. DOI: https://doi.org/10.1016/j.jss.2024.112113
De-Fitero-Dominguez, D., Garcia-Lopez, E., Garcia-Cabot, A., and MartinezHerraiz, J.J., 2024. Enhanced automated code vulnerability repair using large language models. Engineering Applications of Artificial Intelligence, 138, p.109291. DOI: https://doi.org/10.1016/j.engappai.2024.109291
DeLorenzo, M., Gohil, V., and Rajendran, J., 2024. CreativEval: Evaluating creativity of LLM-based hardware code generation. Proceedings of the 2024 IEEE LLM Aided Design Workshop (LAD), San Jose, CA, USA, pp.1-5. DOI: https://doi.org/10.1109/LAD62341.2024.10691798
Dong, Y., Jiang, X., Jin, Z., and Li, G., 2024. Self-collaboration Code Generation via ChatGPT. ACM Transactions on Software Engineering and Methodology, 33, p.189. DOI: https://doi.org/10.1145/3672459
Du, X., Liu, M., Wang, K., Wang, H., Liu, J., Chen, Y., Feng, J., Sha, C., Peng, X., and Lou, Y., 2024. Evaluating Large Language Models in Class-Level Code Generation. In: Proceedings- International Conference on Software Engineering. IEEE Computer Society, pp.982-994. DOI: https://doi.org/10.1145/3597503.3639219
Dumitran, A.M., Badea, A.C., and Muscalu, S.G., 2024. Evaluating the Performance of Large Language Models in Competitive Programming: AMultiYear, Multi-GradeAnalysis. In: 18th International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2024. Institute of Electrical and Electronics Engineers Inc. DOI: https://doi.org/10.1109/INISTA62901.2024.10683837
Evtikhiev, M., Bogomolov, E., Sokolov, Y., and Bryksin, T., 2023. Out of the BLEU: How should we assess quality of the code generation models? Journal of Systems and Software, 203, p.111741. DOI: https://doi.org/10.1016/j.jss.2023.111741
Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., and Zhang, J.M., 2023. Large Language Models for Software Engineering: Survey and Open Problems. In: Proceedings- 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering, ICSE-FoSE 2023. Institute of Electrical and Electronics Engineers Inc., pp.31-53. DOI: https://doi.org/10.1109/ICSE-FoSE59343.2023.00008
Feng, Y., Vanam, S., Cherukupally, M., Zheng, W., Qiu, M., and Chen, H., 2023. Investigating Code Generation Performance of ChatGPT with Crowdsourcing Social Data. In: Proceedings- International Computer Software and Applications Conference. IEEE Computer Society, pp.876-885. DOI: https://doi.org/10.1109/COMPSAC57700.2023.00117
Geng, M., Wang, S., Dong, D., Wang, H., Cao, S., Zhang, K., and Jin, Z., 2023. Interpretation-based Code Summarization. In: IEEE International Conference on Program Comprehension. IEEE Computer Society, pp.113-124. DOI: https://doi.org/10.1109/ICPC58990.2023.00026
Gu, X., Chen, M., Lin, Y., Hu, Y., Zhang, H., Wan, C., Wei, Z., Xu, Y., and Wang, J., 2024. On the effectiveness of large language models in domainspecific code generation. ACM Transactions on Software Engineering and Methodology, 34, p.78. DOI: https://doi.org/10.1145/3697012
Guo, M., 2024. Java Web Programming with ChatGPT. In: 2024 5th International Conference on Mechatronics Technology and Intelligent Manufacturing, ICMTIM 2024. Institute of Electrical and Electronics Engineers Inc., pp.834-838. DOI: https://doi.org/10.1109/ICMTIM62047.2024.10629560
Hajipour, H., Hassler, K., Holz, T., Schonherr, L., and Fritz, M., 2024. CodeLMSec benchmark: Systematically evaluating and finding security vulnerabilities in black-box code language models. In: Proceedings - IEEE Conference on Safe and Trustworthy Machine Learning, SaTML 2024. Institute of Electrical and Electronics Engineers Inc., pp.684-709. DOI: https://doi.org/10.1109/SaTML59370.2024.00040
Hamer, S., D’Amorim, M., and Williams, L., 2024. Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers. In: Proceedings - 45th IEEE Symposium on Security and Privacy Workshops, SPW 2024. Institute of Electrical and Electronics Engineers Inc., pp.87-94. DOI: https://doi.org/10.1109/SPW63631.2024.00014
Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy,J., and Wang, H., 2024. Large language models for software engineering: Asystematic literature review. ACM Transactions on Software Engineering and Methodology, 33(8), p.1-79. DOI: https://doi.org/10.1145/3695988
Jesse, K., Ahmed, T., Devanbu, P.T., and Morgan, E., 2023. Large Language Models and Simple, Stupid Bugs. In: Proceedings - 2023 IEEE/ACM 20th International Conference on Mining Software Repositories, MSR 2023. Institute of Electrical and Electronics Engineers Inc., pp.563-575. DOI: https://doi.org/10.1109/MSR59073.2023.00082
Jiang, X., Dong, Y., Wang, L., Zheng, F., Shang, Q., Li, G., Jin, Z., and Jiao, W., 2024. Self-planning code generation with large language models. ACM Transactions on Software Engineering and Methodology, 33, p.182. DOI: https://doi.org/10.1145/3672456
Jin, K., Wang, C.Y., Pham, H.V., and Hemmati, H., 2024. Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation. In: Proceedings- 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024. Institute of Electrical and Electronics Engineers Inc., pp.167-171. DOI: https://doi.org/10.1145/3643991.3645074
Kalyan, K.S., 2024. A survey of GPT-3 family large language models including ChatGPT and GPT-4. Natural Language Processing Journal, 6, p.100048. DOI: https://doi.org/10.1016/j.nlp.2023.100048
Kashanaki, F.R., Zakharov, M., and Renau, J., 2024. HDLEval Benchmarking LLMs for Multiple HDLs. In: 2024 IEEE LLM Aided Design Workshop, LAD 2024. Institute of Electrical and Electronics Engineers Inc. DOI: https://doi.org/10.1109/LAD62341.2024.10691770
Khojah, R., Mohamad, M., Leitner, P., and De Oliveira Neto, F.G., 2024. Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice. Proceedings of the ACM on Software Engineering, 1(FSE), pp.1819-1840. DOI: https://doi.org/10.1145/3660788
Khoury, R., Avila, A.R., Brunelle, J., and Camara, B.M., 2023. How Secure is Code Generated by ChatGPT? In: Conference Proceedings- IEEE International Conference on Systems, Man and Cybernetics. Institute of Electrical and Electronics Engineers Inc., pp.2445-2451. DOI: https://doi.org/10.1109/SMC53992.2023.10394237
Kou, B., Chen, S., Wang, Z., Ma, L., and Zhang, T., 2024. Do large language models pay similar attention like human programmers when generating code? Proceedings of the ACM on Software Engineering, 1, pp.2261-2284. DOI: https://doi.org/10.1145/3660807
Koubaa, A., Qureshi, B., Ammar, A., Khan, Z., Boulila, W., and Ghouti, L., 2023. Humans are still better than ChatGPT: Case of the IEEEXtreme competition. Heliyon, 9(11), p.e21624. DOI: https://doi.org/10.1016/j.heliyon.2023.e21624
Li, J., Li, G., Li, Y., and Jin, Z., 2024a. Structured chain-of-thought prompting for code generation. ACM Transactions on Software Engineering and Methodology, 34, p.34. DOI: https://doi.org/10.1145/3690635
Li, J., Zhang, Y., Karas, Z., Mcmillan, C., Leach, K., and Huang, Y., 2024b. Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization. In: IEEE International Conference on Program Comprehension. IEEE Computer Society, pp.47-51. DOI: https://doi.org/10.1145/3643916.3644434
Liu, C., Bao, X., Zhang, H., Zhang, N., Hu, H., Zhang, X., and Yan, M., 2024a. Guiding ChatGPT for Better Code Generation: An Empirical Study. In: Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024. Institute of Electrical and Electronics Engineers Inc., pp.102-113. DOI: https://doi.org/10.1109/SANER60148.2024.00018
Liu, M., Pinckney, N., Khailany, B., and Ren, H., 2023. Invited Paper: VerilogEval: Evaluating Large Language Models for Verilog Code Generation. In: IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD. Institute of Electrical and Electronics Engineers Inc. DOI: https://doi.org/10.1109/ICCAD57390.2023.10323812
Liu, Z., Tang, Y., Luo, X., Zhou, Y., and Zhang, L.F., 2024b. No need to lift a finger anymore? Assessing the quality of code generation by ChatGPT. IEEE Transactions on Software Engineering, 50(6), pp.1548-1584. DOI: https://doi.org/10.1109/TSE.2024.3392499
López Espejel, J., Yahaya Alassan, M.S., Chouham, E.M., Dahhane, W., and Ettifouri, E.H., 2023. A comprehensive review of state-of-the-art methods for Java code generation from natural language text. Natural Language Processing Journal, 3, p.100013. DOI: https://doi.org/10.1016/j.nlp.2023.100013
Lu, Y., Sun, C., Yan, Y., Zhu, H., Song, D., Peng, Q., Yu, L., Wang, X., Jiang, J., and Ye, X., 2024. A Comprehensive Survey of Datasets for Large Language Model Evaluation. In: 2024 5th Information Communication Technologies Conference, ICTC 2024. Institute of Electrical and Electronics Engineers Inc., pp.330-336. DOI: https://doi.org/10.1109/ICTC61510.2024.10601918
MacEdo, M., Tian, Y., Cogo, F., and Adams, B., 2024. Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation. In: Proceedings - 2024 IEEE/ACM 1st International Conference on AI Foundation Models and Software Engineering, FORGE 2024. Association for Computing Machinery, Inc., pp.57-68. DOI: https://doi.org/10.1145/3650105.3652301
Majdinasab, V., Bishop, M.J., Rasheed, S., Moradidakhel, A., Tahir, A., and Khomh, F., 2024. Assessing the Security of GitHub Copilot’s Generated Code-A Targeted Replication Study. In: Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024. Institute of Electrical and Electronics Engineers Inc., pp.435-444. DOI: https://doi.org/10.1109/SANER60148.2024.00051
Mendes, W., Souza, S., and De Souza, C.R.B., 2024. “You’re on a Bicycle with a Little Motor”: Benefits and Challenges of using AI Code Assistants. In: 2024 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE), pp.144-152. DOI: https://doi.org/10.1145/3641822.3641882
Miah, T., and Zhu, H., 2024. User Centric Evaluation of Code Generation Tools (Invited Paper). In: 2024 IEEE International Conference on Artificial Intelligence Testing (AITest), pp.109-119. DOI: https://doi.org/10.1109/AITest62860.2024.00022
Moradi Dakhel, A., Majdinasab, V., Nikanjam, A., Khomh, F., Desmarais, M.C., and Jiang, Z.M.J., 2023. GitHub copilot AI pair programmer: Asset or liability? Journal of Systems and Software, 203(C), p.111734. DOI: https://doi.org/10.1016/j.jss.2023.111734
Moratis, K., Diamantopoulos, T., Nastos, D.N., and Symeonidis, A., 2024. Write me this Code: An Analysis of ChatGPT Quality for Producing Source Code. In: Proceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024. Institute of Electrical and Electronics Engineers Inc., pp.147-151. DOI: https://doi.org/10.1145/3643991.3645070
Nazir, A., and Wang, Z., 2023. Acomprehensive survey of ChatGPT: Advancements, applications, prospects, and challenges. Meta-Radiology, 1, p.100022. DOI: https://doi.org/10.1016/j.metrad.2023.100022
Nikolaidis, N., Flamos, K., Gulati, K., Feitosa, D., Ampatzoglou, A., and Chatzigeorgiou, A., 2024. A Comparison of the Effectiveness of ChatGPT and Co-Pilot for Generating Quality Python Code Solutions. In: Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering - Companion, SANER-C 2024. Institute of Electrical and Electronics Engineers Inc., pp.93-101. DOI: https://doi.org/10.1109/SANER-C62648.2024.00018
Niu, C., Li, C., Ng, V., Chen, D., Ge, J., and Luo, B., 2023. An Empirical Comparison of Pre-Trained Models of Source Code. In: Proceedings - International Conference on Software Engineering. IEEE Computer Society, pp.2136-2148. DOI: https://doi.org/10.1109/ICSE48619.2023.00180
Niu, C., Zhang, T., Li, C., Luo, B., and Ng, V., 2024. On Evaluating the Efficiency of Source Code Generated by LLMs. In: Proceedings - 2024 IEEE/ACM 1st International Conference on AI Foundation Models and Software Engineering, FORGE 2024. Association for Computing Machinery, Inc., pp.103-107. DOI: https://doi.org/10.1145/3650105.3652295
Ouyang, S., Zhang, J.M., Harman, M., and Wang, M., 2024. An empirical study of the non-determinism of ChatGPT in code generation. ACM Transactions on Software Engineering and Methodology, 34, p.42. DOI: https://doi.org/10.1145/3697010
Paul, D.G., Zhu, H., and Bayley, I., 2024a. Benchmarks and Metrics for Evaluations of Code Generation: ACritical Review. In: 2024 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, pp.87-94. DOI: https://doi.org/10.1109/AITest62860.2024.00019
Paul, D.G., Zhu, H., and Bayley, I., 2024b. ScenEval: Abenchmark for scenariobased evaluation of code generation. In: 2024 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, pp.55-63. DOI: https://doi.org/10.1109/AITest62860.2024.00015
Petersen, K., Vakkalanka, S., and Kuzniarz, L., 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology, 64, pp.1-18. DOI: https://doi.org/10.1016/j.infsof.2015.03.007
Petrovic, N., Konicanin, S., and Suljovic, S., 2023. ChatGPT in IoT Systems: Arduino Case Studies. In: 2023 IEEE 33rd International Conference on Microelectronics, MIEL 2023. Institute of Electrical and Electronics Engineers Inc. DOI: https://doi.org/10.1109/MIEL58498.2023.10315791
Rai, L., Khatiwada, S., Deng, C., and Liu, F., 2024. Cross-Language Code Development with Generative AI: A Source-to-Source Translation Perspective. In: 2024 IEEE 7th International Conference on Electronic Information and Communication Technology, ICEICT 2024. Institute of Electrical and Electronics Engineers Inc., pp.562-565. DOI: https://doi.org/10.1109/ICEICT61637.2024.10671366
Rizvi, A., Simon, N., Tocho, J., Yongaci, A., Abi-Karam, S., and Hao, C., 2024. Evaluating Large Language Models for High-Level Synthesis. In: 2024 IEEE Opportunity Research Scholars Symposium (ORSS), pp.49-52. DOI: https://doi.org/10.1109/ORSS62274.2024.10697938
Sakib, F.A., Khan, S.H., and Karim, A.H.M.R., 2023. Extending the Frontier of ChatGPT: Code Generation and Debugging. George Mason University, Virginia. Sharma, T., Kechagia, M., Georgiou, S., Tiwari, R., Vats, I., Moazen, H., and Sarro, F., 2024. Asurvey on machine learning techniques applied to source code. Journal of Systems and Software, 209, p.111934. DOI: https://doi.org/10.1016/j.jss.2023.111934
Siddiq, M.L., Roney, L., Zhang, J., and Santos, J.C.S., 2024. Quality Assessment of ChatGPT Generated Code and their Use by Developers. In: Proceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024. Institute of Electrical and Electronics Engineers Inc., pp.152-156. DOI: https://doi.org/10.1145/3643991.3645071
Su, H., Ai, J., Yu, D., and Zhang, H., 2023. An Evaluation Method for Large Language Models’ Code Generation Capability. In: Proceedings - 2023 10th International Conference on Dependable Systems and Their Applications, DSA 2023. Institute of Electrical and Electronics Engineers Inc., pp.831-838. DOI: https://doi.org/10.1109/DSA59317.2023.00118
Tony, C., Mutas, M., Ferreyra, N.E.D., and Scandariato, R., 2023. LLMSecEval: ADataset of Natural Language Prompts for Security Evaluations. In: Proceedings-2023 IEEE/ACM 20th International Conference on Mining Software Repositories, MSR 2023. Institute of Electrical and Electronics Engineers Inc., pp.588-592. DOI: https://doi.org/10.1109/MSR59073.2023.00084
Vijayaraghavan, P., Shi, L., Ambrogio, S., Mackin, C., Nitsure, A., Beymer, D., and Degan, E., 2024. VHDL-Eval: AFramework for Evaluating Large Language Models in VHDL Code Generation. In: 2024 IEEE LLM Aided Design Workshop (LAD). IEEE, pp.1-6. DOI: https://doi.org/10.1109/LAD62341.2024.10691836
Wan, Y., Bi, Z., He, Y., Zhang, J., Zhang, H., Sui, Y., Xu, G., Jin, H., and Yu, P., 2024. Deep learning for code intelligence: Survey, benchmark and toolkit. ACM Computing Surveys, 56, p.309. DOI: https://doi.org/10.1145/3664597
Wang, J., and Chen, Y., 2023. A Review on Code Generation with LLMs: Application and Evaluation. In: Proceedings - 2023 1st IEEE International Conference on Medical Artificial Intelligence, MedAI 2023. Institute of Electrical and Electronics Engineers Inc., pp.284-289. DOI: https://doi.org/10.1109/MedAI59581.2023.00044
Wang, W., Ning, H., Zhang, G., Liu, L., and Wang, Y., 2024. Rocks coding, not development: A human-centric, experimental evaluation of LLM-supported SE tasks. Proceedings of the ACM on Software Engineering, 1, pp.699-721. DOI: https://doi.org/10.1145/3643758
Xiao, T., Treude, C., Hata, H., and Matsumoto, K., 2024. DevGPT: Studying Developer-ChatGPT Conversations. In: Proceedings of the 21st International Conference on Mining Software Repositories, MSR ’24. Association for Computing Machinery, New York, NY, USA, pp.227-230. DOI: https://doi.org/10.1145/3643991.3648400
Xu, B., Nguyen, T.D., Le-Cong, T., Hoang, T., Liu, J., Kim, K., Gong, C., Niu, C., Wang, C., Le, B., and Lo, D., 2023. Are We Ready to Embrace Generative AI for Software Q and A? In: Proceedings - 2023 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023. Institute of Electrical and Electronics Engineers Inc., pp.1713-1717. DOI: https://doi.org/10.1109/ASE56229.2023.00023
Yan, D., Gao, Z., and Liu, Z., 2023. ACloser Look at Different Difficulty Levels Code Generation Abilities of ChatGPT. In: Proceedings - 2023 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023. Institute of Electrical and Electronics Engineers Inc., pp.1887-1898. DOI: https://doi.org/10.1109/ASE56229.2023.00096
Yang, Z., Liu, F., Yu, Z., Keung, J.W., Li, J., Liu, S., Hong, Y., Ma, X., Jin, Z., and Li, G., 2024. Exploring and Unleashing the Power of Large Language Models in Automated Code Translation. Proceedings of the ACM on Software Engineering, 1(FSE), pp.1585-1608. DOI: https://doi.org/10.1145/3660778
Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., and Zhang, Y., 2024. A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 4, p.100211. DOI: https://doi.org/10.1016/j.hcc.2024.100211
Yu, H., Shen, B., Ran, D., Zhang, J., Zhang, Q., Ma, Y., Liang, G., Li,Y., Wang, Q., and Xie, T., 2024. CoderEval: ABenchmark of Pragmatic Code Generation with Generative Pretrained Models. In: 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, pp.428-439. DOI: https://doi.org/10.1145/3597503.3623316
Zhao, Z., Sun, J., Cai, C.H., and Wei, Z., 2024. Code Generation Using SelfInteractive Assistant. Institute of Electrical and Electronics Engineers (IEEE), United States, pp.2347-2352. DOI: https://doi.org/10.1109/COMPSAC61105.2024.00377
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sardar K. Jabrw, Qusay I. Alsarhan

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
-
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
-
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.
Accepted 2025-07-14
Published 2025-08-06