python - pandas and nltk: get most common phrases -


fairly new python , i'm working pandas data frames column full of text. i'm trying take column , use nltk find common phrases (three or 4 word).

    dat["text_clean"] =      dat["description"].str.replace('[^\w\s]','').str.lower()  dat["text_clean2"] = dat["text_clean"].apply(word_tokenize)  finder = bigramcollocationfinder.from_words(dat["text_clean2"]) finder # bigrams appear 3+ times finder.apply_freq_filter(3) # return 10 n-grams highest pmi print finder.nbest(bigram_measures.pmi, 10) 

the initial comments seem work fine. however, when attempt use bigramcollocation, throws following error.

n [437]: finder = bigramcollocationfinder.from_words(dat["text_clean2"]) finder  traceback (most recent call last):    file "<ipython-input-437-635c3b3afaf4>", line 1, in <module>     finder = bigramcollocationfinder.from_words(dat["text_clean2"])    file "/users/abrahammathew/anaconda/lib/python2.7/site-packages/nltk/collocations.py", line 168, in from_words     wfd[w1] += 1  typeerror: unhashable type: 'list' 

any idea refers or workaround.

same error following commands also.

gg = dat["text_clean2"].tolist()     finder = bigramcollocationfinder.from_words(gg) finder = bigramcollocationfinder.from_words(dat["text_clean2"].values.reshape(-1, )) 

the following works, returns there no common phrases.

gg = dat["description"].str.replace('[^\w\s]','').str.lower() finder = bigramcollocationfinder.from_words(gg) finder # bigrams appear 3+ times finder.apply_freq_filter(2) # return 10 n-grams highest pmi print finder.nbest(bigram_measures.pmi, 10) 

it seem bigramcollocationfinder class wants list of words, not list of lists. try this:

finder = bigramcollocationfinder.from_words(dat["text_clean2"].values.reshape(-1, )) 

Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -