Datasetv0.1 · Early Access
ILM Dataset
The Insight Language Machine is a CC BY-SA 4.0 licensed multilingual lexical knowledge graph. Available on HuggingFace as a gated dataset for verified researchers and AI teams.
Files
lemmas_hf_final.jsonlAll lemmas with core fields136,629✅
senses_hf_final.jsonlAll senses with definitions122,230✅
ilm_dataset.jsonlMultilingual glosses (17 lang)136,629🔄 35%
senses_multilingual.jsonlMultilingual sense glosses (15 lang)122,230🔄 22%
ilm_dataset_clean.jsonlDeduplicated clean subset25,713✅
Access
ILM is hosted on HuggingFace as a gated private dataset. Request access for AI training, NLP research, or academic use. Commercial licensing inquiries welcome.