To complement which corpus, we taken from the fresh new Politoscope databases twenty five, 883 tweets published by brand new eleven individuals and no other secret politicians between (get a hold of Text message B for the S1 File). This second corpus has the benefit of showing the fresh new templates you to definitely emerged for the governmental debates, independently of one’s candidates’ programmatic orientations.
There are 2 categories of mainstream methods for the fresh new removal out of subject areas of unstructured text: co-term analysis and you can matter modeling that have LDA such actions . In these techniques, topics is recognized as “bags away from terms”, inferred regarding the statistics off look of a summary of predetermined keywords this new documents. That it listing is actually itself acquired because of mostly cutting-edge text-exploration steps inside the fields from sheer vocabulary control (NLP) and machine studying.