Porter Stemming Algorithm — Basic Intro

A gentle introduction to stemming

Vijini Mallawaarachchi
6 min readOct 24, 2023
Image by author

In linguistics (study of language and its structure), a stem is part of a word, that is common to all of its inflected variants.

  • CONNECT
  • CONNECTED
  • CONNECTION
  • CONNECTING

The above words are inflected variants of CONNECT. Hence, CONNECT is a stem. To this stem, we can add different suffixes to form different words.

The process of reducing such inflected (or sometimes derived) words to their word stem is known as Stemming. For example, CONNECTED, CONNECTION and CONNECTING can be reduced to the stem CONNECT.

The Porter Stemming algorithm (or Porter Stemmer) is used to remove the suffixes from an English word and obtain its stem which becomes very useful in the field of Information Retrieval (IR). This process reduces the number of terms kept by an IR system which will be advantageous both in terms of space and time complexity. This algorithm was developed by a British Computer Scientist named Martin F. Porter. You can visit the official home page of the Porter stemming algorithm for further information.

First, a few terms and expressions will be introduced, which will be helpful for ease of explanation.

--

--

Vijini Mallawaarachchi

Bioinformatician | Computational Genomics 🧬 | Data Science 👩🏻‍💻 | Music 🎵 | Astronomy 🔭 | Travel 🎒 | vijinimallawaarachchi.com