Skip to content Skip to sidebar Skip to footer

Count Words In Python

I have a list of strings in python. list = [ 'Sentence1. Sentence2...', 'Sentence1. Sentence2...',...] I want to remove stop words and count occurrence of each word of all differen

Solution 1:

If you don't mind installing a new python library, I suggest you use gensim. The first tutorial does exactly what you ask:

# remove common words and tokenizestoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

You will then need to create the dictionary for your corpus of document and create the bag-of-words.

dictionary = corpora.Dictionary(texts)
dictionary.save('/tmp/deerwester.dict') # store the dictionary, for future print(dictionary)

You can weight the result using tf-idf and stuff and do LDA quite easily after.

Have a look at the tutorial 1 here

Solution 2:

You've failed to thoroughly explain what you have in mind, but this may be what you're looking for:

counts = collections.Counter(' '.join(your_list).split())

Post a Comment for "Count Words In Python"