../

Tanglish

Tanglish

NLP experiments on Tanglsih

Links

What the hell is Tanglish?

Complexities

  • Tanglish has its own sort of grammar. It doesn’t strictly conform to the classical grammar of Tamil nor does it conform to English
  • Moreover even spoken Tamil is vastly different that textual sources. This presents a problem when we want to train Language models. English, in addition to having a treasure of literature, has a large corpus of spoken text (reddit, twitter, etc.)
  • This allows for very rich NLP and auto-completition

Task accomplished

  • Preprocessing
  • N-grams
  • Transliteration
    • We attempted to convert Tanglish to plain Tamil in order to use existing Tamil NLP toolkits
  • POS tagging
  • NER
  • Sentiment analysis using LSTM and BERT

UI