Count Words In Python
I have a list of strings in python. list = [ 'Sentence1. Sentence2...', 'Sentence1. Sentence2...',...] I want to remove stop words and count occurrence of each word of all differen
Solution 1:
If you don't mind installing a new python library, I suggest you use gensim. The first tutorial does exactly what you ask:
# remove common words and tokenizestoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
You will then need to create the dictionary for your corpus of document and create the bag-of-words.
dictionary = corpora.Dictionary(texts)
dictionary.save('/tmp/deerwester.dict') # store the dictionary, for future print(dictionary)
You can weight the result using tf-idf and stuff and do LDA quite easily after.
Have a look at the tutorial 1 here
Solution 2:
You've failed to thoroughly explain what you have in mind, but this may be what you're looking for:
counts = collections.Counter(' '.join(your_list).split())
Post a Comment for "Count Words In Python"