Build text classification models ( CBOW and Skip-gram) with FastText in python

NLP is an exciting way to interpret the textual data especially when we know that computer neither speak nor understand any kind of human language. So, how do we represent each word of a language in such a unique numerical pattern and process it in quickest way possible. Answer is FastText library.

Tags: Artificial Intelligence, Deep Learning & Artificial Intelligence, Data Science, NLP, Machine Learning

Scheduled on friday 10:30 in room cubus

Speaker

Kajal Puri (Agirlhasnofame)

Kajal Puri is working as a Data Scientist in Fractal Analytics. Before this, she has been dabbling with numbers and statistical models through personal projects and industrial internships (All thanks to Startups!). She has trained models to make them understand human language (Natural Language Processing) and categorise objects (Computer Vision). In her spare time, when she is not reading about AI Apocalypse, she can be found writing poetry on https://www.yourquote.in/kajal-puri-cbi/quotes/

Description

FastText has been open-sourced by Facebook in 2016 and with its release, it became the fastest and most accurate library in Python for text classification and word representation. It is to be seen as a substitute for gensim package's word2vec. It includes the implementation of two extremely important methodologies in NLP i.e Continuous Bag of Words and Skip-gram model. Fasttext performs exceptionally well with supervised as well as unsupervised learning.

The tutorial will be divided in following four segments :

  1. 0-10 minutes: The talk will begin with explaining the difference between word embeddings generated by word2vec, Glove, Fasttext and how FastText beats all the other libraries with better accuracy and in lesser time.

  2. 10-25 minutes: The code will be shown and explained line by line for both the models (CBOW and Skip-gram) on a standard textual labeled dataset with the tips on hyper-parametric tuning to get the best possible results.

  3. 25-40 minutes: How to use the pre-trained word embeddings released by FastText on various languages and where to use them. Various use cases of what kind of problems can be solved using FastText in python.

  4. 40-45 minutes: For QA session.