Legal Case LLM: An Open-Source Fine-Tuned Model for Indonesian Human Trafficking Jurisprudence

Muhammad Hariz Faizul Anwar; Nizam Avif Anhari; Galih Wasis Wicaksono; Nur Putri Hidayah

doi:10.30595/juita.v14i1.28345

Authors

Muhammad Hariz Faizul Anwar Universitas Muhammadiyah Malang
Nizam Avif Anhari Universitas Muhammadiyah Malang
Galih Wasis Wicaksono Universitas Muhammadiyah Malang
Nur Putri Hidayah Universitas Muhammadiyah Malang

DOI:

https://doi.org/10.30595/juita.v14i1.28345

Keywords:

Indonesian human trafficking; legal LLM; jurisprudence; legal AI, transformers.

Abstract

This paper presents Legal-Case LLM, an open-source, fine-tuned language model tailored for Indonesian human-trafficking jurisprudence. General-purpose large language models exhibit high fluency but risk factual hallucination and limited jurisprudential fidelity when applied to legal texts. The objective is to develop a reproducible model that improves factual recall, legal terminology use, and jurisprudential alignment for Indonesian trafficking cases. Methods: We assembled a curated corpus of 400 court decisions from the Direktori Putusan Mahkamah Agung, extracted structured metadata and summaries, and generated question–answer pairs via large models followed by multi-stage cleaning and expert validation. We fine-tuned open models from the LLaMA family variants using parameter-efficient techniques (LoRA), evaluated with automatic metrics (ROUGE, BLEU, BERTScore, BARTScore), and a focused qualitative audit. Results: The fine-tuned model demonstrates marked improvements in content recall and semantic alignment versus zero-shot baselines, produces more jurisprudentially aligned phrasing (accurate use of terms such as amar putusan, Majelis Hakim, and percobaan), and reduces hallucination propensity in statute-related outputs. Conclusion and impact: Legal-Case LLM offers a reproducible, transparent tool to assist legal practitioners and researchers in Indonesia, while emphasising human-in-the-loop verification and citation-matching to ensure legal reliability and ethical deployment.

Author Biographies

Muhammad Hariz Faizul Anwar, Universitas Muhammadiyah Malang

Informatics

Nizam Avif Anhari, Universitas Muhammadiyah Malang

Informatics

Galih Wasis Wicaksono, Universitas Muhammadiyah Malang

Informatics

Nur Putri Hidayah, Universitas Muhammadiyah Malang

Law

References

[1] H. Alamleh, A. A. S. AlQahtani, and A. ElSaid, “Distinguishing Human-Written and ChatGPT-Generated Text Using Machine Learning,” in 2023 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA: IEEE, Apr. 2023, pp. 154–158. doi: 10.1109/SIEDS58326.2023.10137767.

[2] D. Stap, E. Hasler, B. Byrne, C. Monz, and K. Tran, “The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand: Association for Computational Linguistics, 2024, pp. 6189–6206. doi: 10.18653/v1/2024.acl-long.336.

[3] Y. Wang, H. Le, A. Gotmare, N. Bui, J. Li, and S. Hoi, “CodeT5+: Open Code Large Language Models for Code Understanding and Generation,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore: Association for Computational Linguistics, 2023, pp. 1069–1088. doi: 10.18653/v1/2023.emnlp-main.68.

[4] C. S. K. Aditya and F. D. S. Sumadi, “Combination of Term Weighting with Class Distribution and Centroid-based Approach for Document Classification,” Kinet. Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control, vol. 8, no. 4`, Nov. 2023, doi: 10.22219/kinetik.v8i4%60.1793.

[5] K. S. Kalyan, “A survey of GPT-3 family large language models including ChatGPT and GPT-4,” Nat. Lang. Process. J., vol. 6, p. 100048, Mar. 2024, doi: 10.1016/j.nlp.2023.100048.

[6] X. Yuan, S. Yuan, Y. Cui, T. Lin, X. Wang, R. Xu, J. Chen, and D. Yang, “Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA: Association for Computational Linguistics, 2024, pp. 8015–8036. doi: 10.18653/v1/2024.emnlp-main.456.

[7] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online: Association for Computational Linguistics, 2021, pp. 483–498. doi: 10.18653/v1/2021.naacl-main.41.

[8] J. R. K. Suseno, A. E. Minarno, and Y. Azhar, “Implementation of Pretrained VGG16 Model for Rice Leaf Disease Classification using Image Segmentation,” Kinet. Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control, Mar. 2023, doi: 10.22219/kinetik.v8i1.1592.

[9] D. M. Anisuzzaman, J. G. Malins, P. A. Friedman, and Z. I. Attia, “Fine-Tuning Large Language Models for Specialized Use Cases,” Mayo Clin. Proc. Digit. Health, vol. 3, no. 1, p. 100184, Mar. 2025, doi: 10.1016/j.mcpdig.2024.11.005.

[10] S. Zhou, Z. Xu, M. Zhang, C. Xu, Y. Guo, Z. Zhan, Y. Fang, S. Ding, J. Wang, K. Xu, L. Xia, J. Yeung, D. Zha, D. Cai, G. B. Melton, M. Lin, and R. Zhang, “Large language models for disease diagnosis: a scoping review,” Npj Artif. Intell., vol. 1, no. 1, p. 9, June 2025, doi: 10.1038/s44387-025-00011-z.

[11] V. Liévin, C. E. Hother, A. G. Motzfeldt, and O. Winther, “Can large language models reason about medical questions?,” Patterns, vol. 5, no. 3, p. 100943, Mar. 2024, doi: 10.1016/j.patter.2024.100943.

[12] X. Yang, L. Pan, X. Zhao, H. Chen, L. R. Petzold, W. Y. Wang, and W. Cheng, “A Survey on Detection of LLMs-Generated Content,” in Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA: Association for Computational Linguistics, 2024, pp. 9786–9805. doi: 10.18653/v1/2024.findings-emnlp.572.

[13] J. Lai, W. Gan, J. Wu, Z. Qi, and P. S. Yu, “Large language models in law: A survey,” AI Open, vol. 5, pp. 181–196, 2024, doi: 10.1016/j.aiopen.2024.09.002.

[14] C. Xiao, X. Hu, Z. Liu, C. Tu, and M. Sun, “Lawformer: A pre-trained language model for Chinese legal long documents,” AI Open, vol. 2, pp. 79–84, 2021, doi: 10.1016/j.aiopen.2021.06.003.

[15] D. M. Katz, M. J. Bommarito, S. Gao, and P. Arredondo, “GPT-4 Passes the Bar Exam,” SSRN Electron. J., 2023, doi: 10.2139/ssrn.4389233.

[16] F. Rivas-Echeverría, L. T. Ramos, J. L. Ibarra, S. Zerpa-Bonillo, S. Arciniegas, and M. Asprino-Salas, “LegalBot-EC: An LLM-Based Chatbot for Legal Assistance in Ecuadorian Law,” IEEE Access, vol. 13, pp. 106817–106833, 2025, doi: 10.1109/access.2025.3580488.

[17] J. J. Nay, D. Karamardian, S. B. Lawsky, W. Tao, M. Bhat, R. Jain, A. T. Lee, J. H. Choi, and J. Kasai, “Large language models as tax attorneys: a case study in legal capabilities emergence,” Philos. Trans. R. Soc. Math. Phys. Eng. Sci., vol. 382, no. 2270, Apr. 2024, doi: 10.1098/rsta.2023.0159.

[18] Y. Wu, C. Wang, E. Gumusel, and X. Liu, “Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning,” in Findings of the Association for Computational Linguistics ACL 2024, Bangkok, Thailand and virtual meeting: Association for Computational Linguistics, 2024, pp. 15542–15555. doi: 10.18653/v1/2024.findings-acl.918.

[19] A. Kusumowijoyo, A. Marta, and K. Natali Boasrifa, “The Artificial Intelligence as a One-Stop Point for Dealing with Online Human Trafficking Scams in Indonesia,” J. Sustain. Dev. Regul. Issues JSDERI, vol. 1, no. 3, pp. 189–211, Sept. 2023, doi: 10.53955/jsderi.v1i3.18.

[20] N. Earlyana and K. L. L. Aung, “LEGAL PROTECTION OF INDONESIAN MIGRANT WORKERS INVOLVED IN ILLEGAL ACTIVITIES OF THE ONLINE SCAMMER SECTOR IN CAMBODIA,” vol. 02, no. 01, 2025.

[21] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. Bang, D. Chen, W. Dai, H. S. Chan, A. Madotto, and P. Fung, “Survey of Hallucination in Natural Language Generation,” ACM Comput. Surv., vol. 55, no. 12, pp. 1–38, Dec. 2023, doi: 10.1145/3571730.

[22] Y. Tao, O. Viberg, R. S. Baker, and R. F. Kizilcec, “Cultural bias and cultural alignment of large language models,” PNAS Nexus, vol. 3, no. 9, p. pgae346, Sept. 2024, doi: 10.1093/pnasnexus/pgae346.

[23] I. Chalkidis, A. Jana, D. Hartung, M. Bommarito, I. Androutsopoulos, D. Katz, and N. Aletras, “LexGLUE: A Benchmark Dataset for Legal Language Understanding in English,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland: Association for Computational Linguistics, 2022, pp. 4310–4330. doi: 10.18653/v1/2022.acl-long.297.

[24] G. W. Wicaksono, N. P. Hidayah, C. S. K. Aditya, A. F. P. Dewa, H. Fatikasari, M. A. P. Insani, M. H. F. Anwar, and N. A. Anhari, “Human Trafficking Court Decisions (Indonesia) — Structured Dataset.” Mendeley Data, Sept. 15, 2025. doi: 10.17632/8GTBKY7R9X.1.

[25] P. Italiani, G. Moro, and L. Ragazzi, “Enhancing legal question answering with data generation and knowledge distillation from large language models,” Artif. Intell. Law, July 2025, doi: 10.1007/s10506-025-09463-9.

[26] M. Goyal and Q. H. Mahmoud, “A Systematic Review of Synthetic Data Generation Techniques Using Generative AI,” Electronics, vol. 13, no. 17, p. 3509, Sept. 2024, doi: 10.3390/electronics13173509.

[27] M. Etcheverry, T. Real-del-Sarte, and P. Chavallard, “Algorithm for Automatic Legislative Text Consolidation,” in Proceedings of the Natural Legal Language Processing Workshop 2024, Miami, FL, USA: Association for Computational Linguistics, 2024, pp. 166–175. doi: 10.18653/v1/2024.nllp-1.13.

[28] J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu, “RoFormer: Enhanced transformer with Rotary Position Embedding,” Neurocomputing, vol. 568, p. 127063, Feb. 2024, doi: 10.1016/j.neucom.2023.127063.

[29] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries”.

[30] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, Philadelphia, Pennsylvania: Association for Computational Linguistics, 2001, p. 311. doi: 10.3115/1073083.1073135.

[31] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SkeHuCVFDr

[32] W. Yuan, G. Neubig, and P. Liu, “BARTSCORE: evaluating generated text as text generation,” in Proceedings of the 35th International Conference on Neural Information Processing Systems, in NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021.

[33] C. Ryu, S. Lee, S. Pang, C. Choi, H. Choi, M. Min, and J.-Y. Sohn, “Retrieval-based Evaluation for LLMs: A Case Study in Korean Legal QA,” in Proceedings of the Natural Legal Language Processing Workshop 2023, D. Preoțiuc-Pietro, C. Goanta, I. Chalkidis, L. Barrett, G. Spanakis, and N. Aletras, Eds., Singapore: Association for Computational Linguistics, Dec. 2023, pp. 132–137. doi: 10.18653/v1/2023.nllp-1.13.

[34] Qwen, A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Tang, T. Xia, X. Ren, X. Ren, Y. Fan, Y. Su, Y. Zhang, Y. Wan, Y. Liu, Z. Cui, Z. Zhang, and Z. Qiu, “Qwen2.5 Technical Report,” Jan. 03, 2025, arXiv: arXiv:2412.15115. doi: 10.48550/arXiv.2412.15115.

[35] M. Abdin, “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone,” Aug. 30, 2024, arXiv: arXiv:2404.14219. doi: 10.48550/arXiv.2404.14219.

[36] Y. Liu, D. Iter, Y. Xu, S. Wang, R. Xu, and C. Zhu, “G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore: Association for Computational Linguistics, 2023, pp. 2511–2522. doi: 10.18653/v1/2023.emnlp-main.153.