Parution de l’article Automatic Extraction of Topics from Documents: Five Probabilistic Topic Model Tests dans International Journal of New Technology and Research (IJNTR)

  • Imprimer

 

L’article de Sandra Jhean-Larose, Nicolas Leveau, Guy Denhière et Ba-Linh Nguyen a paru en novembre dernier dans le International Journal of New Technology and Research (IJNTR).

Ci-joint le résumé de l’article :

In this paper, we test the capability of the Topicmodel to extract topics from documents (Griffiths & Steyvers, 2003, 2004 ; Griffiths, Steyvers & Tenenbaum, 2007). After presenting the mathematical aspects of the model and demonstrating its behavior on a small corpus, we attempt to falsify the model by manipulating (i) the size and similarities between the sub-corpora, (ii) the relative weight of sub-corpora, and (iii) the permeability to the scope and nature of contexts added to a fixed corpus. The model successfully passed our five tests, demonstrating that first, extracted topics were relevant and congruent to the content of the corpus, and second, that their probability appropriately reflected the relative weight of sub-corpora.