knlp

We focus on developing algorithms to process text and to make their information accessible to many Natural Language Processing-based applications. We also specialize in Korean Language Processing and keep some Korean Language Processing tools and resources. If you are interested, please contact us!

We are developing Korean Large Language Models that can well reflect the characteristics of the Korean language by creating new tokenizers and other tools, and by fine-tuning, combining and integrating layers of existing open models in various ways to apply them to various fields.

Try out Our LLM DaG and KATALOG !

We are building and releasing various types of Korean pre-trained models trained through Transformer-based language models such as BERT and GPT

Try out Korean based Bert pre-trained (KR-BERT), KR-KOSAC-BERT, and many other models.

We are working on the Korean version of Temporal Awareness and Reasoning Systems for Question Interpretation, following the work of TARSQI in Brandeis University. Currently, we are developing the Korean TimeML (Markup Language for Temporal and Event Expressions).

TimeML is a robust specification language for events and temporal expressions in natural language. It is designed to address four problems in event and temporal expression markup:

1. Time stamping of events (identifying an event and anchoring it in time)

2. Ordering events with respect to one another (lexical versus discourse properties of ordering)

3. Reasoning with contextually underspecified temporal expressions (temporal functions such as 'last week' and 'two weeks before')

4. Reasoning about the persistence of events (how long does an event or the outcome of an event last)

We are developing Korean lexical resources for various NLP task

The KOLON(KOrean Lexicon mapped onto ONtology) - we map Korean nouns and predicates (verbs and adjectves) from the Sejong Electronic Dictionary onto the Mikrokosmos Ontology developed by New Mexico State University. The KOLON is different from other Wordnets for Korean in that it separates concepts from lexical items, and lexical items are mapped onto the concepts, which ends up combining ontological relations with lexical constrains, and achieving byproduct, lexical hierarchies. Lexical items now have various lexical relations such as hypernymy and homonymy, syntactic information such as subcategorization, and semantic information such as conceptual structures (semantic classifications). The Resource browser will be available pretty soon.

We are also working on the methods for automatic clustering of similar words from the web. Word Similarity for unlisted words in a dictionary is important for NLP work. Our similarity measure for Korean helps us to enrich our lexical resources with those newly created or unlisted words.

Fields in which we are interested in relation to Korean Language:

Analysis of the spoken Korean language. We are searching for ways of doing chunking and partial spoken language analysis.
Construction of a system of semantic categories applied to the Korean language.

As part of the work on constructing the 21st Century Sejong Electronic Dictionary, we have been in charge of its "special words", which are abbreviations frequently found in texts, recently made words, proper nouns, foreign words, in short, words that are not listed in dictionaries but are essential for the research on Korean language processing.

Also, we have been working on the mapping of Korean basic verbs and nouns over the Mikrokosmos Ontology, which is basic for Korean language processing.

Nowadays, research related to ontologies in connection with natural language processing of meanings is a trend. These ontologies, as structures of concepts, are a part of a knowledge base needed for lexical bases, lexical networks, semantic networks and meta-NLP. Concerning this field, we have been doing the following at our lab:

Construction of an ontology by structuring various concepts, and, following this, trying to classify the Korean lexicon, which is used for establishing semantic relations and constructing lexicons on specialized fields.
Research on the application of an ontology in an actual system, based on experience in the development of an actual ontology, Mikrokosmos Ontology at CRL of New Mexico State University.
Research on the solution for Korean words' suitableness based on language resources rooted in ontologies, as well as research on ontology integration.

Research and use of XML, the widely used eXtensible Markup Language, for computational linguistics and NLP.
Research on a large-scale (multilingual) language database.
Participation in the construction of a multilingual database, "Interface for syntax/semantics of natural languages".
Development of tools based on XML for the development of grammars for theoretical linguists.

Research on information retrieval based on natural language.
Research on an ontology-based highly efficient system.

By making use of collocations, morphology, grammatical properties, we have created a database, and we are now working on how to get a higher performance from the lexical information retrieval system based on existing theoretical lexical information, and how to improve the precision of the calculation model for the statistical classification of documents. We are applying linguistic information (part of speech, meaning) to decrease the vector space, and through this grasp the character of the text to be able to analyze documents by automatic question-and-answer system, and automatic grading of essays.