Indian Address Parser

Parse unstructured Indian addresses into structured components using mBERT-CRF (Multilingual BERT with Conditional Random Field).

Features

  • Supports Hindi + English (Devanagari and Latin scripts)
  • 15 entity types: House Number, Floor, Block, Gali, Colony, Area, Khasra, Pincode, etc.
  • Delhi-specific locality gazetteer for improved accuracy
  • < 30ms inference time

Example Addresses

Results

Highlighted Entities

Extracted Entities

Structured Output

Entity Legend


Model: IndicBERTv2-SS + CRF (ai4bharat/IndicBERTv2-SS + CRF layer) | Training Data: 600+ annotated Delhi addresses | GitHub: indian-address-parser