About

About ILM

What is ILM?

The Insight Language Machine is a diachronic lexical knowledge graph for Malayalam — the language of Kerala, spoken by 38 million people.

Built on 67,875 lemmas and 117,151 senses spanning classical Malayalam to contemporary usage, ILM covers 17 languages and is designed for AI training, machine translation, and computational linguistics.

Architecture

ILM is structured as a concept-centric knowledge graph comparable to WordNet and ConceptNet:

◦Lemma layer — orthographic forms, POS, frequency, script variants
◦Sense layer — definitions (ML + EN), domain, register, examples
◦Gloss layer — multilingual translations across 17 languages
◦Etymology layer — Gundert 1872, DED Dravidian roots, loanword tracking
◦Corpus layer — frequency data from 13 Malayalam text corpora

Sources

Malayalam Wiktionary dump

Gundert Dictionary (1872)

Malayalam Lexicon

Shabdatharavali

Aithihyamala citations

DED Dravidian Etymology

Kittel (Kannada)

Monier-Williams (Sanskrit)

13 Malayalam PDF corpora

Kerala Sahitya Charithram

Publisher

Insight Publica

Kozhikode, Kerala, India

License & Citation

CC BY-SA 4.0

@dataset{

ilm2026,

title = {'Insight Language Machine: A Diachronic Malayalam Lexical Knowledge Graph'},

author = {'Insight Publica'},

year = {'2026'},

url = {'https://huggingface.co/datasets/insightpublica/ilm'},

}