Skip to main content


How to use ngrams and Markov Chains for Natural Language Processing

Implementation of n-grams:


Presentation notes in markdown:

The Facebook Live link:

One fish, two fish, red fish....

The first President of the United States was...

Horton hears a...

The human mind knows how to process text inputs and come up with likely results. But how do you help a computer do the same thing? Want to take a guess?

The answer is Natural Language Processing, or NLP. NLP is a huge subject (see:, but a key tool for NLP is what I talk about in this demonstration. The n-gram.

What is an N-gram?

An n-gram is a Language Modeling Tool that can be easily processed by computers.

N = 1 : "Unigram (or, you know, a word)" ie: "The"

N = 2 : "Bigram" ie: "The Cat"

N = 3 : "Trigram", etc. ie: "The Cat sat"

N = 4 : "Four-gram", "Five-Gram", etc. ie: "The Cat sat on the hat"

Here's a Formal Definition for an n-gram:

In the fields of computational linguistics and probability, an n-gram is:

a contiguous sequence of n items from a given sequence of text or speech.

The items can be phonemes, syllables, letters, words or base pairs according to the application.

The n-grams typically are collected from a text or speech corpus.

Here are some more n-gram example:

To apply an n-gram, we use the Markov Model of language structure:

Here I provide an example to help you understand Markov Models with everybody's old favorite, Green Eggs and Ham:

Can I use PSQL for bigrams and trigrams and n-grams? Can I use Sequelize for n-grams?

How do I generate an n-gram using Sequelize and PSQL? Here's an example:

Finally, here is a great list of resources for n-grams:

Project Members: John Backes

Find the program that fits your life.

Learn about our coding, cybersecurity, and data analytics bootcamps offered on full-time and part-time schedules.