lda - gensim.interfaces.TransformedCorpus - How use? -
i'm relative new in world of latent dirichlet allocation. able generate lda model following wikipedia tutorial , i'm able generate lda model own documents. step try understand how can use previus generated model classify unseen documents. i'm saving "lda_wiki_model" with
id2word =gensim.corpora.dictionary.load_from_text('ptwiki_wordids.txt.bz2') mm = gensim.corpora.mmcorpus('ptwiki_tfidf.mm') lda = gensim.models.ldamodel.ldamodel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1) lda.save('lda_wiki_model.lda')
and i'm loading same model with:
new_lda = gensim.models.ldamodel.load(path + 'lda_wiki_model.lda') #carrega o modelo
i have "new_doc.txt", , turn document id<-> term dictionary , converted tokenized document "document-term matrix"
but when run new_topics = new_lda[corpus]
receive 'gensim.interfaces.transformedcorpus object @ 0x7f0ecfa69d50'
how can extract topics that?
i tried
`lsa = models.ldamodel(new_topics, id2word=dictionary, num_topics=1, passes=2) corpus_lda = lsa[new_topics] print(lsa.print_topics(num_topics=1, num_words=7)
and
print(corpus_lda.print_topics(num_topics=1, num_words=7
) `
but return topics not relationed new document. mistake? i'm miss understanding something?
**if run new model using dictionary , corpus created above, receive correct topics, point is: how re-use model? correctly re-use wiki_model?
thank you.
i facing same problem. code solve problem:
new_topics = new_lda[corpus] topic in new_topics: print(topic)
this give list of tuples of form (topic number, probability)
Comments
Post a Comment