It’s been a little while since we have found a need to apply our word count and word frequency algorithms against text. And with recent news about the healthcare overhaul flourishing the internet we felt it'd be a good idea and delve deeper into the words of this bill.
The following logic was applied prior to the word frequency count of the speech:
- remove any reference to punctuation
- remove months
- remove any references to (.,-?;:)
- converted text to lower case
- set format prior to word cont ()
- ran code algorithm against text to derive word frequency (delimiter is a "space" or " ")
- output file (word-frequency-healthcare-bill.xls)
- scrubbed pdf online and derived something legible (final-cut3.txt)
Observations:
- the word "methamphetamine" is present 3 times
- the word "california" is present 34 times ... why?
- the word "michigan" is present 1 time
- the word "arizona" is present 1 time
- the words "new york" is present 1 time
- the word "childcare" is present 1 time
- the word "child" is present 164 times
- the word "indian" is present 1151 times
- the word "death" is present 15 times
- the word "terrorist" is present 2 times
- the word "plumbing" is present 1 time
- the word "cancer" is present 16 times
- the word "abortion" is present 16 times
And many many more... Have fun!