On July 18, 2025, the Indian Institute of Technology (IIT) Roorkee unveiled MoScNet, the world’s first AI model designed to transliterate the historic Modi script into Devanagari, marking a monumental step in preserving India’s medieval manuscripts. Named Historic Scripts to Modern Vision, this project, led by Prof. Sparsh Mittal, introduces MoDeTrans, a unique dataset of over 2,000 Modi script manuscript images, enabling researchers to access and study India’s cultural heritage. With over 40 million Modi script documents across India, this AI tool addresses a critical gap in digitizing and preserving historical records.
Key Points:
- MoScNet is the first AI to convert Modi script into Devanagari.
- MoDeTrans dataset includes 2,000+ images from Shivakalin, Peshwekalin, and Anglakalin eras.
- Supports national initiatives like Digital India, BharatGPT, and Bhashini.
What is Modi Script?
The Modi script, a semi-cursive writing system used primarily for Marathi, was prevalent in Maharashtra and parts of Central and Western India from the 17th century until the early 20th century. Employed during Chhatrapati Shivaji’s era, the Peshwa administration, and British colonial period, it documented land records, Ayurveda texts, and medieval scientific works. Its cursive nature, diverse styles, and challenges like angular strokes and blurred texts make manual transliteration difficult, with only a handful of experts remaining.
Key Points:
- Used for Marathi records during Shivaji’s reign, Peshwa era, and British period.
- Over 40 million documents, including land deeds and scientific texts, remain untranslated.
- Cursive script and fading manuscripts pose challenges for human experts.
MoScNet: The AI-Powered Solution
MoScNet, built on a Vision-Language Model (VLM) architecture, outperforms traditional Optical Character Recognition (OCR) tools by accurately converting handwritten Modi script into readable Devanagari text. Developed with contributions from students Harshal and Tanvi (COEP Technological University) and Onkar (Vishwakarma Institute of Information Technology), this lightweight, scalable model is ideal for low-resource environments, making it accessible for widespread use.
Key Points:
- Uses VLM architecture for superior transliteration accuracy.
- Lightweight design suits low-resource settings.
- Developed by Prof. Sparsh Mittal and student collaborators.
MoDeTrans: A Pioneering Dataset
The MoDeTrans dataset, the first of its kind, features over 2,000 images of authentic Modi script manuscripts from three historical periods—Shivakalin (17th century), Peshwekalin (18th century), and Anglakalin (British era)—paired with expert-verified Devanagari transliterations. Open-sourced on Hugging Face, it empowers global researchers to advance studies in Indian history, linguistics, and AI development.
Key Points:
- Includes 2,000+ manuscript images with verified Devanagari transliterations.
- Covers Shivaji’s era to British colonial records.
- Open-sourced on Hugging Face for global access.
Aligning with National and Global Goals
MoScNet and MoDeTrans align with India’s Digital India, Bhashini, and BharatGPT initiatives, enhancing multilingual AI capabilities and heritage digitization. The project also supports the United Nations Sustainable Development Goal (SDG) 11.4, which emphasizes protecting cultural heritage. Its framework could be adapted for other endangered scripts like Sharda, Mahajani, or Grantha, offering a global model for historical preservation.
Key Points:
- Supports Digital India, Bhashini, and BharatGPT for multilingual AI.
- Aligns with UN SDG 11.4 for cultural heritage preservation.
- Scalable for other endangered scripts worldwide.
Impact on Research and Preservation
With over 40 million Modi script documents deteriorating and a dwindling number of experts, MoScNet fills a critical gap in academic and archival research. It enables historians, linguists, and archivists to access records on land deeds, Ayurveda, and medieval science, unlocking insights into India’s rich history. Institutions in Maharashtra and cultural ministries are already exploring its use for heritage recovery projects.
Key Points:
- Preserves 40 million+ Modi script documents for research.
- Enables study of medieval history, science, and culture.
- Attracts interest from Maharashtra institutions for digitization.
Open-Source for Global Innovation
In a commitment to ethical AI, IIT Roorkee has open-sourced both MoScNet and MoDeTrans on Hugging Face, fostering community-driven innovation. Prof. Sparsh Mittal emphasized, “We aim to democratize access to India’s ancient knowledge using open-source, scalable, and ethically trained AI tools.” This move ensures researchers, students, and enthusiasts worldwide can build on this work, potentially extending it to voice-based systems or other scripts.
Key Points:
- MoScNet and MoDeTrans freely available on Hugging Face.
- Encourages global collaboration and innovation.
- Future plans include voice-based Devanagari outputs from Modi texts.
How to Engage with the Initiative
Explore the MoScNet model and MoDeTrans dataset on Hugging Face to contribute to or utilize this technology. Follow IIT Roorkee (@iitroorkee on X) or visit iitr.ac.in for updates on this and other innovative projects. Researchers and enthusiasts can also collaborate with cultural ministries or archives to digitize Modi script records.
Key Points:
- Access MoScNet and MoDeTrans on Hugging Face.
- Follow @iitroorkee for project updates.
- Collaborate with archives for heritage digitization projects.






