Task: to identify interesting collocations from text
Corpus: speeches made by politicians in the U.S. House of Representatives during debates over legislation
Data: downloaded from http://www.cs.cornell.edu/home/llee/data/convote.html
Analysis: using google's nltk package, collocations module.
Association measures:
- chi square
- mutual information
- log likelihood
- raw frequency
Green: appear in both parties' speech, but in different order.
Red: only in Republican speech
Blue: only in Democrat speech.
Black: almost equal importance given by both parties
This table is the result of using raw frequency count of 2 consecutive words appearing together in the corpus, sorted by frequency.
Top 16 collocations by
Democrats
united-states
stem-cell
health-care
american-people
cell-research
social-security
tax-cuts
patriot-act
embryonic-stem
conference-report
estate-tax
last-year
bill-would
endangered-species
national-security
homeland-security
|
Top 16 collocations by
Republicans
united-states
stem-cell
embryonic-stem
small-businesses
small-business
would-like
cell-research
patriot-act
may-consume
american-people
health-care
death-tax
homeland-security
federal-government
law-enforcement
conference-report
|
health-care sobuj kintu homeland-security ba cell-research kalo keno?
ReplyDeletejeguli dui party ee shoman gurutto diye alochone korechhe sheguli kalo . ashole aro valo moto analyse kora dorkar chhilo.
ReplyDeletethanks.