healthcare reform bill word play

It’s been a little while since we have found a need to apply our word count and word frequency algorithms against text. And with recent news about the healthcare overhaul flourishing the internet we felt it'd be a good idea and delve deeper into the words of this bill.

The following logic was applied prior to the word frequency count of the speech:

  1. remove any reference to punctuation
  2. remove months 
  3. remove any references to (.,-?;:)
  4. converted text to lower case
  5. set format prior to word cont ()
  6. ran code algorithm against text to derive word frequency (delimiter is a "space" or " ")
  7. output file (word-frequency-healthcare-bill.xls)
  8. scrubbed pdf online and derived something legible (final-cut3.txt)

Observations:

  1. the word "methamphetamine" is present 3 times
  2. the word "california" is present 34 times ... why?
    1. the word "michigan" is present 1 time
    2. the word "arizona" is present 1 time
    3. the words "new york" is present 1 time
  3. the word "childcare" is present 1 time
  4. the word "child" is present 164 times
  5. the word "indian" is present 1151 times
  6. the word "death" is present 15 times
  7. the word "terrorist" is present 2 times
  8. the word "plumbing" is present 1 time
  9. the word "cancer" is present 16 times
  10. the word "abortion" is present 16 times

And many many more...  Have fun!

AttachmentSize
final-cut3.txt1.95 MB
word-frequency-healthcare-bill.xls459 KB