python - How to get a complete topic distribution for a document using gensim LDA? -


when train lda model such

dictionary = corpora.dictionary(data) corpus = [dictionary.doc2bow(doc) doc in data] num_cores = multiprocessing.cpu_count() num_topics = 50 lda = ldamulticore(corpus, num_topics=num_topics, id2word=dictionary,  workers=num_cores, alpha=1e-5, eta=5e-1) 

i want full topic distribution num_topics each , every document. is, in particular case, want each document have 50 topics contributing distribution and want able access 50 topics' contribution. output lda should if adhering strictly mathematics of lda. however, gensim outputs topics exceed threshold shown here. example, if try

lda[corpus[89]] >>> [(2, 0.38951721864890398), (9, 0.15438596408262636), (37, 0.45607443684895665)] 

which shows 3 topics contribute document 89. have tried solution in link above, not work me. still same output:

theta, _ = lda.inference(corpus) theta /= theta.sum(axis=1)[:, none] 

produces same output i.e. 2,3 topics per document.

my question how change threshold can access full topic distribution each document? how can access full topic distribution, no matter how insignificant contribution of topic document? reason want full distribution can perform kl similarity search between documents' distribution.

thanks in advance

it doesnt seem has replied yet, i'll try , answer best can given gensim documentation.

it seems need set parameter minimum_probability 0.0 when training model desired results:

lda = ldamulticore(corpus=corpus, num_topics=num_topics, id2word=dictionary, workers=num_cores, alpha=1e-5, eta=5e-1,               minimum_probability=0.0)  lda[corpus[233]] >>> [(0, 5.8821799358842424e-07),  (1, 5.8821799358842424e-07),  (2, 5.8821799358842424e-07),  (3, 5.8821799358842424e-07),  (4, 5.8821799358842424e-07),  (5, 5.8821799358842424e-07),  (6, 5.8821799358842424e-07),  (7, 5.8821799358842424e-07),  (8, 5.8821799358842424e-07),  (9, 5.8821799358842424e-07),  (10, 5.8821799358842424e-07),  (11, 5.8821799358842424e-07),  (12, 5.8821799358842424e-07),  (13, 5.8821799358842424e-07),  (14, 5.8821799358842424e-07),  (15, 5.8821799358842424e-07),  (16, 5.8821799358842424e-07),  (17, 5.8821799358842424e-07),  (18, 5.8821799358842424e-07),  (19, 5.8821799358842424e-07),  (20, 5.8821799358842424e-07),  (21, 5.8821799358842424e-07),  (22, 5.8821799358842424e-07),  (23, 5.8821799358842424e-07),  (24, 5.8821799358842424e-07),  (25, 5.8821799358842424e-07),  (26, 5.8821799358842424e-07),  (27, 0.99997117731831464),  (28, 5.8821799358842424e-07),  (29, 5.8821799358842424e-07),  (30, 5.8821799358842424e-07),  (31, 5.8821799358842424e-07),  (32, 5.8821799358842424e-07),  (33, 5.8821799358842424e-07),  (34, 5.8821799358842424e-07),  (35, 5.8821799358842424e-07),  (36, 5.8821799358842424e-07),  (37, 5.8821799358842424e-07),  (38, 5.8821799358842424e-07),  (39, 5.8821799358842424e-07),  (40, 5.8821799358842424e-07),  (41, 5.8821799358842424e-07),  (42, 5.8821799358842424e-07),  (43, 5.8821799358842424e-07),  (44, 5.8821799358842424e-07),  (45, 5.8821799358842424e-07),  (46, 5.8821799358842424e-07),  (47, 5.8821799358842424e-07),  (48, 5.8821799358842424e-07),  (49, 5.8821799358842424e-07)] 

Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -