Storing an ordered dictionary in Redis

An ordered dictionary is like a normal dict, but the keys are ordered by an ordering function. In the case of Redis, it supports ordered dictionaries whose keys are strings and whose values are floating point scores. This structure can come in handy in cases where we need to calculate the information gain (covered in the Calculating high information words recipe in Chapter 7, Text Classification), and when you want to store all the words and scores for later use.

Getting ready

Again, you'll need Redis and redis-py installed with an instance of redis-server running, as explained in the earlier recipe, Storing a frequency distribution in Redis.

How to do it...

The RedisOrderedDict class in rediscollections.py extends collections.MutableMapping to get a number of dict compatible methods for free. Then, it implements all the key methods that require Redis ordered set (also known as Zset) commands:

class RedisOrderedDict(collections.MutableMapping):
  def __init__(self, r, name):
    self._r = r
    self._name = encode_key(name)
  
  def __iter__(self):
    return iter(self.items())
  
  def __len__(self):
    return self._r.zcard(self._name)
  
  def __getitem__(self, key):
    return self._r.zscore(self._name, encode_key(key))
  
  def __setitem__(self, key, score):
    self._r.zadd(self._name, encode_key(key), score)
  
  def __delitem__(self, key):
    self._r.zrem(self._name, encode_key(key))
  
  def keys(self, start=0, end=-1):
    # we use zrevrange to get keys sorted by high value instead of by lowest
    return self._r.zrevrange(self._name, start, end)
  
  def values(self, start=0, end=-1):
    return [v for (k, v) in self.items(start=start, end=end)]
  
  def items(self, start=0, end=-1):
    return self._r.zrevrange(self._name, start, end, withscores=True)
  
  def get(self, key, default=0):
    return self[key] or default
  
  def iteritems(self):
    return iter(self)
  
  def clear(self):
    self._r.delete(self._name)

You can create an instance of RedisOrderedDict by passing in a Redis connection and a unique name:

>>> from redis import Redis
>>> from rediscollections import RedisOrderedDict
>>> r = Redis()
>>> rod = RedisOrderedDict(r, 'test')
>>> rod.get('bar')
>>> len(rod)
0
>>> rod['bar'] = 5.2
>>> rod['bar']
5.2000000000000002
>>> len(rod)
1
>>> rod.items()
[(b'bar', 5.2)]
>>> rod.clear()

Note

By default, keys are returned as binary strings. If you want a plain string, you can convert the keys using key.decode(). You can always look up values with normal strings.

How it works...

Much of the code may look similar to the RedisHashMap, which is to be expected since they both extend collections.MutableMapping. The main difference here is that RedisOrderedSet orders keys by floating point values, and so it is not suited for arbitrary key-value storage like the RedisHashMap. Here's an outline explaining each key method and how they work with Redis:

  • __len__(): This uses the zcard command to get the number of elements in the ordered set.
  • __getitem__(): This uses the zscore command to get the score of a key, and returns 0 if the key does not exist.
  • __setitem__(): This uses the zadd command to add a key to the ordered set with the given score, or updates the score if the key already exists.
  • __delitem__(): This uses the zrem command to remove a key from the ordered set.
  • keys(): This uses the zrevrange command to get all the keys in the ordered set, sorted by the highest score. It takes two optional keyword arguments, start and end, to more efficiently get a slice of the ordered keys.
  • values(): This extracts all the scores from the items() method.
  • items(): This uses the zrevrange command to get the scores of each key in order to return a list of 2-tuples ordered by the highest score. Like keys(), it takes start and end keyword arguments to efficiently get a slice.
  • clear(): This uses the delete command to remove the entire ordered set from Redis.

Note

The default ordering of items in a Redis ordered set is low-to-high, so that the key with the lowest score comes first. This is the same as Python's default list ordering when you call sort() or sorted(), but this is not what we want when it comes to scoring. For storing scores, we expect items to be sorted from high-to-low, which is why keys() and items() use zrevrange instead of zrange.

All the Redis commands are documented at http://redis.io/commands.

There's more...

As mentioned previously, the keys() and items() methods take optional start and end keyword arguments to get a slice of the results. This makes RedisOrderedDict optimal for storing scores and getting the top N keys.

Note

The start and end keyword arguments are inclusive, so if you use start=0 and end=2, you will get up to three elements.

Here's a simple example where we assign three word scores and get the top two:

>>> from redis import Redis
>>> from rediscollections import RedisOrderedDict
>>> r = Redis()
>>> rod = RedisOrderedDict(r, 'scores')
>>> rod['best'] = 10
>>> rod['worst'] = 0.1
>>> rod['middle'] = 5
>>> rod.keys()
[b'best', b'middle', b'worst']
>>> rod.keys(start=0, end=1)
[b'best', b'middle']
>>> rod.clear()

See also

The Calculating high information words recipe in Chapter 7, Text Classification, describes how to calculate information gain, which is a good case for storing word scores in a RedisOrderedDict. The Storing a frequency distribution in Redis recipe introduces Redis and the RedisHashMap.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset