Parsing dates and times with dateutil

If you need to parse dates and times in Python, there is no better library than dateutil. The parser module can parse datetime strings in many more formats than can be shown here, while the tz module provides everything you need for looking up timezones. When combined, these modules make it quite easy to parse strings into timezone-aware datetime objects.

Getting ready

You can install dateutil using pip or easy_install, that is, sudo pip install dateutil==2.0 or sudo easy_install dateutil==2.0. You need the 2.0 version for Python 3 compatibility. The complete documentation can be found at http://labix.org/python-dateutil.

How to do it...

Let's dive into a few parsing examples:

>>> from dateutil import parser
>>> parser.parse('Thu Sep 25 10:36:28 2010')
datetime.datetime(2010, 9, 25, 10, 36, 28)
>>> parser.parse('Thursday, 25. September 2010 10:36AM')
datetime.datetime(2010, 9, 25, 10, 36)
>>> parser.parse('9/25/2010 10:36:28')
datetime.datetime(2010, 9, 25, 10, 36, 28)
>>> parser.parse('9/25/2010')
datetime.datetime(2010, 9, 25, 0, 0)
>>> parser.parse('2010-09-25T10:36:28Z')
datetime.datetime(2010, 9, 25, 10, 36, 28, tzinfo=tzutc())

As you can see, all it takes is importing the parser module and calling the parse() function with a datetime string. The parser will do its best to return a sensible datetime object, but if it cannot parse the string, it will raise a ValueError.

How it works...

The parser does not use regular expressions. Instead, it looks for recognizable tokens and does its best to guess what those tokens refer to. The order of these tokens matters; for example, some cultures use a date format that looks like Month/Day/Year (the default order), while others use a Day/Month/Year format. To deal with this, the parse() function takes an optional keyword argument, dayfirst, which defaults to False. If you set it to True, it can correctly parse dates in the latter format.

>>> parser.parse('25/9/2010', dayfirst=True)
datetime.datetime(2010, 9, 25, 0, 0)

Another ordering issue can occur with two-digit years. For example, '10-9-25' is ambiguous. Since dateutil defaults to the Month-Day-Year format, '10-9-25' is parsed to the year 2025. But if you pass yearfirst=True into parse(), it will be parsed to the year 2010:

>>> parser.parse('10-9-25')
datetime.datetime(2025, 10, 9, 0, 0)
>>> parser.parse('10-9-25', yearfirst=True)
datetime.datetime(2010, 9, 25, 0, 0)

There's more...

The dateutil parser can also do fuzzy parsing, which allows it to ignore extraneous characters in a datetime string. With the default value of False, parse() will raise a ValueError when it encounters unknown tokens. But if fuzzy=True, then a datetime object can usually be returned:

>>> try:
...    parser.parse('9/25/2010 at about 10:36AM')
... except ValueError:
...    'cannot parse'
'cannot parse'
>>> parser.parse('9/25/2010 at about 10:36AM', fuzzy=True)
datetime.datetime(2010, 9, 25, 10, 36)

See also

In the next recipe, we'll use the tz module of dateutil to do timezone lookup and conversion.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset