Stemming in NLP

Stemming is the process of reducing words to their root or base form by chopping off suffixes. It helps in normalizing text and is useful for tasks like search, indexing, and text mining.

Why Use Stemming?

Reduces different forms of a word to a common base form.
Improves performance in search and information retrieval.
Simplifies text analysis by reducing vocabulary size.

Popular Stemming Algorithms

Porter Stemmer: One of the most commonly used stemmers, good balance between accuracy and speed.
Snowball Stemmer: An improved and more flexible version of Porter Stemmer supporting multiple languages.
Lancaster Stemmer: More aggressive stemmer producing shorter stems, sometimes over-stemming.

10 Examples of Stemming

from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer

words = ["running", "runs", "runner", "easily", "fairly", "cats", "studies", "studying"]

# Example 1: Using Porter Stemmer
porter = PorterStemmer()
print([porter.stem(word) for word in words])

# Example 2: Using Snowball Stemmer (English)
snowball = SnowballStemmer("english")
print([snowball.stem(word) for word in words])

# Example 3: Using Lancaster Stemmer
lancaster = LancasterStemmer()
print([lancaster.stem(word) for word in words])

# Example 4: Comparing stemmers on a verb
word = "consolidation"
print("Porter:", porter.stem(word))
print("Snowball:", snowball.stem(word))
print("Lancaster:", lancaster.stem(word))

# Example 5: Stemming plural nouns
print(porter.stem("geese"))
print(porter.stem("wolves"))

# Example 6: Stemming compound words
print(porter.stem("multiuser"))
print(snowball.stem("multiuser"))

# Example 7: Stemming words with suffixes
print(porter.stem("happiness"))
print(porter.stem("happily"))

# Example 8: Stemming words with prefixes
print(porter.stem("unhappy"))
print(snowball.stem("unhappy"))

# Example 9: Stemming irregular verbs
print(porter.stem("went"))
print(snowball.stem("went"))

# Example 10: Stemming words with hyphens
print(porter.stem("co-operate"))
print(snowball.stem("co-operate"))