wordcount.py is a tiny Python program I wrote for fun, that counted the most frequent words in an article and their occurances. It supports arbitrary number of most frequent words.
The code, along with some sample results, can be found here: Github.
I selected two quite personal pieces to examine their 15 most frequent words:
The results are shockingly similar, since they are both so personal.
Joy:
i 474 my 130 you 129 me 90 one 89 — 77 he 76 his 63 did 61 could 59 even 56 we 55 who 50 any 47 your 47
Lake:
i 35 we 20 you 16 my 15 lake 14 up 14 been 13 one 12 out 11 they 10 his 10 over 9 their 9 same 9 he 9
Now, The Ring of Time, anoter piece by E. B. White, which is not as personal as Lake:
her 30 she 29 i 18 one 11 horse 10 ring 7 we 7 time 7 out 6 his 6 around 6 circus 6 woman 5 two 5 girl 5
The most frequent words here change to "her" and "she", which refers to the girl. Also notice the most important objects being portrayed in the article: circus, horse, time, ring, woman, girl.
Let's check out the famous I Have a Dream:
we 29 freedom 19 our 17 i 15 let 13 negro 13 one 12 every 10 day 10 come 10 go 9 dream 9 ring 9 their 8 able 8
Some difference, but not huge. Now the 25 most frequent words:
we 29 freedom 19 our 17 i 15 let 13 negro 13 one 12 every 10 day 10 come 10 go 9 dream 9 ring 9 their 8 able 8 back 8 you 8 nation 7 satisfied 6 white 6 cannot 6 long 6 new 5 great 5 my 5
Huge difference from all other three articles!
Created Date: November 26, 2011 10:34:46
Last Modified: November 26, 2011 11:02:07