class documentation
class Collocation:
A basic collocation calculator class.
Method | __init__ |
connectors takes a list of words that, are removed when they appear at the _edge_ of an n-gram (for n > 1), but are left if they are inside (so for n >= 3) |
Method | add |
Used by consume_tokens, you typically should not need this |
Method | add |
Used by consume_tokens, you typically should not need this |
Method | cleanup |
CONSIDER: allow different threshold for each length, e.g. via a list for mincount |
Method | cleanup |
Remove unigrams for which the given function returns true |
Method | cleanup |
Remove unigrams that are rare - by default: that appear just once. You may wish to increase this. ideally we remove all n-grams using them too, but it's faster to waste the memory and leave them there. |
Method | cleanup |
Remove unigrams for which the given function returns true |
Method | consume |
Takes a list of string tokens. Counts unigram and n-gram from it, for given values of n. |
Method | counts |
returns counts of tokens, unigrams, and n>2-grams |
Method | score |
Takes the counts we already did, returns a list of items like: |
Instance Variable | connectors |
Undocumented |
Instance Variable | grams |
Undocumented |
Instance Variable | saw |
Undocumented |
Instance Variable | uni |
Undocumented |
connectors takes a list of words that, are removed when they appear at the _edge_ of an n-gram (for n > 1), but are left if they are inside (so for n >= 3)
Remove unigrams that are rare - by default: that appear just once. You may wish to increase this. ideally we remove all n-grams using them too, but it's faster to waste the memory and leave them there.