Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Storing a conditional frequency distribution in Redis

The nltk.probability.ConditionalFreqDist class is a container for FreqDist instances, with one FreqDist per condition. It is used to count frequencies that are dependent on another condition, such as another word or a class label. We used this class in the Calculating high information words recipe in Chapter 7, Text Classification. Here, we'll create an API-compatible class on top of Redis using the RedisHashFreqDist from the previous recipe.

Getting ready

As in the previous recipe, you'll need to have Redis and redis-py installed with an instance of redis-server running.

How to do it...

We define a RedisConditionalHashFreqDist class in redisprob.py that extends nltk.probability.ConditionalFreqDist and overrides the __getitem__() method. We override __getitem__() so we can create an instance of RedisHashFreqDist instead of a FreqDist:

from nltk.probability import ConditionalFreqDist
from rediscollections import encode_key

class RedisConditionalHashFreqDist(ConditionalFreqDist):
  def __init__(self, r, name, cond_samples=None):
    self._r = r
    self._name = name
    ConditionalFreqDist.__init__(self, cond_samples)
   
    for key in self._r.keys(encode_key('%s:*' % name)):
      condition = key.split(':')[1]
      self[condition] # calls self.__getitem__(condition)
  
  def __getitem__(self, condition):
    if condition not in self._fdists:
      key = '%s:%s' % (self._name, condition)
      val = RedisHashFreqDist(self._r, key)
      super(RedisConditionalHashFreqDist, self).__setitem__(condition, val)
    
    return super(RedisConditionalHashFreqDist, self).__getitem__(condition)
  
  def clear(self):
    for fdist in self.values():
      fdist.clear()

An instance of this class can be created by passing in a Redis connection and a base name. After that, it works just like a ConditionalFreqDist:

>>> from redis import Redis
>>> from redisprob import RedisConditionalHashFreqDist
>>> r = Redis()
>>> rchfd = RedisConditionalHashFreqDist(r, 'condhash')
>>> rchfd.N()
0
>>> rchfd.conditions()
[]

>>> rchfd['cond1']['foo'] += 1
>>> rchfd.N()
1
>>> rchfd['cond1']['foo']
1
>>> rchfd.conditions()
['cond1']
>>> rchfd.clear()

How it works...

The RedisConditionalHashFreqDist uses name prefixes to reference RedisHashFreqDist instances. The name passed into the RedisConditionalHashFreqDist is a base name that is combined with each condition to create a unique name for each RedisHashFreqDist. For example, if the base name of the RedisConditionalHashFreqDist is 'condhash', and the condition is 'cond1', then the final name for the RedisHashFreqDist is 'condhash:cond1'. This naming pattern is used at initialization to find all the existing hash maps using the keys command. By searching for all keys matching 'condhash:*', we can identify all the existing conditions and create an instance of RedisHashFreqDist for each.

Combining strings with colons is a common naming convention for Redis keys as a way to define namespaces. In our case, each RedisConditionalHashFreqDist instance defines a single namespace of hash maps.

There's more...

RedisConditionalHashFreqDist also defines a clear() method. This is a helper method that calls clear() on all the internal RedisHashFreqDist instances. The clear() method is not defined in ConditionalFreqDist.

Table of Contents for
Storing a conditional frequency distribution in Redis

Storing a conditional frequency distribution in Redis

Getting ready

How to do it...

How it works...

There's more...

See also

Table of Contents for Storing a conditional frequency distribution in Redis

Create new playlist

Sign In

Sign Up

Storing a conditional frequency distribution in Redis

Getting ready

How to do it...

How it works...

There's more...

See also

Table of Contents for
Storing a conditional frequency distribution in Redis