About ILM
What is ILM?
The Insight Language Machine is a diachronic lexical knowledge graph for Malayalam — the language of Kerala, spoken by 38 million people.
Built on 136,629 lemmas and 122,230 senses spanning classical Malayalam to contemporary usage, ILM covers 17 languages and is designed for AI training, machine translation, and computational linguistics.
Architecture
ILM is structured as a concept-centric knowledge graph comparable to WordNet and ConceptNet:
- ◦Lemma layer — orthographic forms, POS, frequency, script variants
- ◦Sense layer — definitions (ML + EN), domain, register, examples
- ◦Gloss layer — multilingual translations across 17 languages
- ◦Etymology layer — Gundert 1872, DED Dravidian roots, loanword tracking
- ◦Corpus layer — frequency data from 13 Malayalam text corpora
Sources
Team
Sumesh
Director, Insight Publica — Project Lead
Likhitha
QC Lead
Aakash
Data Pipeline
Ismail
Data Pipeline
Nihal
Data Pipeline
License & Citation
CC BY-SA 4.0
@dataset{
ilm2026,
title = {'Insight Language Machine: A Diachronic Malayalam Lexical Knowledge Graph'},
author = {'Insight Publica'},
year = {'2026'},
url = {'https://huggingface.co/datasets/insightpublica/ilm'},
}