KoBALT-700

Korean Benchmark for Advanced Linguistic Tasks

한국어 대규모 언어 모델(LLM)을 평가하기 위한 언어학 기반 벤치마크

Hugging Face Dataset GitHub Repository arXiv Paper

KoBALT는 한국어 대규모 언어 모델(LLM)을 평가하기 위한 언어학 기반 벤치마크입니다. 5개의 핵심 언어학 영역에 걸쳐 24개의 세분화된 언어학적 현상을 다루는700개의 전문가 작성 객관식 문제로 구성되어 있습니다.

300

Syntax

구문론

215

Semantics

의미론

Pragmatics

화용론

Phonetics

음성학

Morphology

형태론

데이터 오염 최소화

KoBALT는 표준 한국어 코퍼스와 8.6% bigram 및 0.7% trigram 이하의 겹침을 보여 훈련 데이터 오염을 최소화하여 진정한 언어 이해 능력을 평가할 수 있는 견고한 도구입니다.

데이터셋 구성

Domain	Phenomenon	# Items	Description
Syntax	Agreement	104	주어-동사, 높임법, 시제, 극성, 피동/사동 일치
	Argument Structure & Valency	96	술어-논항 관계, 격 실현
	Embedded Clauses	86	복잡한 절 이해
	Ellipsis	11	문법적 생략 패턴
	Scrambling	3	어순 유연성
Semantics	Semantic Compatibility	60	술어-논항 호환성
	Rhetorical Expressions	28	은유, 아이러니, 관용구
	Word Relationships	28	동의어, 반의어, 의미 프레임
	Ambiguity	27	어휘적, 구조적, 범위 모호성
	Numeral Classifiers	27	수량화된 명사와 분류사 형태소
	Conjunctions	24	인과, 시간, 함의 기반 접속사
	Inter-sentence Relations	21	문장 간 의미 일관성
Pragmatics	Speech Acts	22	서술, 질문, 지시, 약속, 표현
	Implicature	22	문자적 내용을 넘어선 함축된 의미
	Discourse Principles	17	대화 격률 및 담화 전략
	Deixis & Reference	17	인칭, 공간, 시간 지시
	Social Relationship Marking	3	높임법, 말투, 호칭
Phonetics/Phonology	Phonological Alternation	34	대치, 탈락, 동화 등
	Phonological Constraints	14	허용 가능한 음운 패턴
	Articulatory Phonetics	7	음성의 조음
	Suprasegmental Features	7	억양, 운율, 의문 억양
Morphology	Word Formation	22	파생, 합성
	Verbal Conjugation	12	동사/형용사의 활용
	POS & Morphemes	8	품사 태깅, 형태소 분석

베이스라인 성능 (도메인별 정확도)

Model	Avg	Syntax	Semantics	Pragmatics	Morphology	Phonetics
Claude-3-7-sonnet	61%	66%	66%	64%	36%	31%
Claude-3-5-sonnet	52%	52%	65%	51%	36%	24%
DeepSeek-V3-XL	47%	49%	56%	42%	24%	29%
GPT-4o	44%	45%	55%	40%	17%	26%
DeepSeek-V3	43%	41%	57%	42%	26%	23%
Qwen2.5-72B	37%	33%	51%	37%	24%	18%
C4ai-command-a-03	36%	30%	52%	36%	24%	18%
Gemma-3-27b	35%	30%	53%	27%	24%	11%
Mistral-Small-24B	32%	27%	49%	30%	21%	11%
Llama-3.3-70B	32%	25%	50%	35%	17%	15%
Qwen2.5-32B	30%	23%	49%	28%	21%	11%
Aya-expanse-32b	25%	21%	40%	12%	10%	16%
Gemma-2-9b	21%	17%	34%	15%	12%	11%
Qwen2.5-7B	19%	14%	33%	11%	19%	6%
Aya-expanse-8b	19%	15%	33%	11%	12%	6%
Llama-3.1-8B	17%	13%	26%	12%	10%	11%
Ministral-8B	17%	11%	29%	15%	10%	11%
Mistral-7B-v0.3	12%	11%	16%	11%	14%	6%

Contributors

Researchers

CL_NLP Lab, Seoul National University

• Dongjun Jang
• Wooseok Song
• Jaeyoon Kim
• Chaeyoung Oh
• Hyemi Jo
• Youngchae Ahn
• Sihyun Oh
• Hyohyeong Jang

Advisors

Seoul National University

• Prof. Hyopil Shin
• Prof. Sangah Lee

LG AI Research

• Jinsik Lee
• Sunkyoung Kim

Sponsor

LG AI Research

License

KoBALT는 Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) 라이선스 하에 배포됩니다.

라이선스 보기

Citation

이 리소스를 사용하시는 경우 다음과 같이 인용해 주시기 바랍니다:

@misc{shin2025kobaltkoreanbenchmarkadvanced,
  title={KoBALT: Korean Benchmark For 
         Advanced Linguistic Tasks}, 
  author={Hyopil Shin and Sangah Lee and 
          Dongjun Jang and Wooseok Song and 
          others},
  year={2025},
  eprint={2505.16125},
  archivePrefix={arXiv},
  url={https://arxiv.org/abs/2505.16125}
}