平時我們在工作的時候需要統(tǒng)計一篇文章或者網(wǎng)頁出現(xiàn)頻率最高的單詞,或者需要統(tǒng)計單詞出現(xiàn)頻率排序。那么如何完成這個任務(wù)了?
例如,我們輸入的語句是 “Hello there this is a test. Hello there this was a test, but now it is not.”,希望得到的升序的結(jié)果:
[[1, 'but'], [1, 'it'], [1, 'not.'], [1, 'now'], [1, 'test,'], [1, 'test.'], [1, 'was'], [2, 'Hello'], [2, 'a'], [2, 'is'], [2, 'there'], [2, 'this']]
得到降序的結(jié)果是:
[[2, 'this'], [2, 'there'], [2, 'is'], [2, 'a'], [2, 'Hello'], [1, 'was'], [1, 'test.'], [1, 'test,'], [1, 'now'], [1, 'not.'], [1, 'it'], [1, 'but']]
完成這個結(jié)果的代碼如下:
class Counter(object):
def __init__(self):
self.dict = {}
def add(self, item):
count = self.dict.setdefault(item, 0)
self.dict[item] = count + 1
def counts(self, desc=None):
result = [[val, key] for (key, val) in self.dict.items()]
result.sort()
if desc:
result.reverse()
return result
if __name__ == '__main__':
'''Produces:
>>> Ascending count:
[[1, 'but'], [1, 'it'], [1, 'not.'], [1, 'now'], [1, 'test,'], [1, 'test.'], [1, 'was'], [2, 'Hello'], [2, 'a'], [2, 'is'], [2, 'there'], [2, 'this']]
Descending count:
[[2, 'this'], [2, 'there'], [2, 'is'], [2, 'a'], [2, 'Hello'], [1, 'was'], [1, 'test.'], [1, 'test,'], [1, 'now'], [1, 'not.'], [1, 'it'], [1, 'but']]
'''
sentence = "Hello there this is a test. Hello there this was a test, but now it is not."
words = sentence.split()
c = Counter()
for word in words:
c.add(word)
print "Ascending count:"
print c.counts()
print "Descending count:"
print c.counts(1)
更多建議: