Running experiments

To illustrate the impact of different parameter settings, we ran a few hundred experiments for different DTM constraints and model parameters. More specifically, we let the min_df and max_df parameters range from 50-500 words and 10% to 100% of documents, respectively using alternatively binary and absolute counts. We then trained LDA models with 3 to 50 topics, using 1 and 25 passes over the corpus.

The following chart illustrates the results in terms of topic coherence (higher is better), and perplexity (lower is better). Coherence drops after 25-30 topics and perplexity similarly increases:

The notebook includes regression results that quantify the relationships between parameters and outcomes. We generally get better results using absolute counts and a smaller vocabulary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset