The nltk.probability.ConditionalFreqDist
class is a container for FreqDist
instances, with one FreqDist
per condition. It is used to count frequencies that are dependent on another condition, such as another word or a class label. We used this class in the Calculating high information words recipe in Chapter 7, Text Classification. Here, we'll create an API-compatible class on top of Redis using the RedisHashFreqDist
from the previous recipe.
As in the previous recipe, you'll need to have Redis
and redis-py
installed with an instance of redis-server
running.
We define a RedisConditionalHashFreqDist
class in redisprob.py
that extends nltk.probability.ConditionalFreqDist
and overrides the __getitem__()
method. We override __getitem__()
so we can create an instance of RedisHashFreqDist
instead of a FreqDist
:
from nltk.probability import ConditionalFreqDist from rediscollections import encode_key class RedisConditionalHashFreqDist(ConditionalFreqDist): def __init__(self, r, name, cond_samples=None): self._r = r self._name = name ConditionalFreqDist.__init__(self, cond_samples) for key in self._r.keys(encode_key('%s:*' % name)): condition = key.split(':')[1] self[condition] # calls self.__getitem__(condition) def __getitem__(self, condition): if condition not in self._fdists: key = '%s:%s' % (self._name, condition) val = RedisHashFreqDist(self._r, key) super(RedisConditionalHashFreqDist, self).__setitem__(condition, val) return super(RedisConditionalHashFreqDist, self).__getitem__(condition) def clear(self): for fdist in self.values(): fdist.clear()
An instance of this class can be created by passing in a Redis
connection and a base name. After that, it works just like a ConditionalFreqDist
:
>>> from redis import Redis >>> from redisprob import RedisConditionalHashFreqDist >>> r = Redis() >>> rchfd = RedisConditionalHashFreqDist(r, 'condhash') >>> rchfd.N() 0 >>> rchfd.conditions() [] >>> rchfd['cond1']['foo'] += 1 >>> rchfd.N() 1 >>> rchfd['cond1']['foo'] 1 >>> rchfd.conditions() ['cond1'] >>> rchfd.clear()
The RedisConditionalHashFreqDist
uses name prefixes to reference RedisHashFreqDist
instances. The name passed into the RedisConditionalHashFreqDist
is a base name that is combined with each condition to create a unique name for each RedisHashFreqDist
. For example, if the base name of the RedisConditionalHashFreqDist
is 'condhash'
, and the condition is 'cond1'
, then the final name for the RedisHashFreqDist
is 'condhash:cond1'
. This naming pattern is used at initialization to find all the existing hash maps using the keys
command. By searching for all keys matching 'condhash:*'
, we can identify all the existing conditions and create an instance of RedisHashFreqDist
for each.
Combining strings with colons is a common naming convention for Redis
keys as a way to define namespaces. In our case, each RedisConditionalHashFreqDist
instance defines a single namespace of hash maps.
RedisConditionalHashFreqDist
also defines a clear()
method. This is a helper method that calls clear()
on all the internal RedisHashFreqDist
instances. The clear()
method is not defined in ConditionalFreqDist
.
The previous recipe covers RedisHashFreqDist
in detail. Also, see the Calculating high information words recipe in Chapter 7, Text Classification, for example usage of ConditionalFreqDist
.