Chapter 26. v2/v3 Migration and Coexistence

The earliest release of Python 3 (the precursor of what we call v3 in this book) first appeared in 2008, specifically as “Python 3.0”: that release was not a production-quality one, nor was it meant to be. Its main purpose was to let people start adapting to changes in syntax and semantics; some parts of the implementation were unsatisfactory (particularly the I/O facilities, now much improved). As we’re writing this chapter, the current releases are 2.7.12 and 3.5.2, with a 3.6.0 release just out in December 2016, very close to the deadline for us to finish this book (as a result, while we mention 3.6’s highlights as being “new in 3.6,” that’s in good part based on a beta version of 3.6: we can’t claim thorough coverage of 3.6).

Python’s core developers have done a lot of work to backport v3 features into v2, but this activity has now ended, and v2 is feature-frozen (some v3 standard library backports to v2 are available for pip install, as repeatedly mentioned earlier in this book). v2 is getting only security bug-fix releases, and even those are due to stop once v2 maintenance formally ends in 2020. Guido van Rossum’s PyCon NA keynote in 2014 explicitly ruled out any 2.8 release; should you choose to stick with v2, after 2020 you will be on your own.

For those who have mastered Python’s build system, or who are able to hire third-party developers to support their v2 use, “sticking with v2” may remain a viable strategy even after 2020. However, for most readers, the first task to address will likely be converting existing v2 code to v3.

Preparing for Python 3

Before starting a coexistence or conversion project, if you are not thoroughly familiar with both versions of Python, it will be helpful to read the official guide Porting Python 2 Code to Python 3 in the Python documentation. Among other things, this makes the same recommendation we do about v2 (i.e., only support Python 2.71).

Make sure that your tests cover as much of your project as possible, so that inter-version errors are likely to be picked up during testing. Aim for at least 80% testing coverage; much more than 90% can be difficult to achieve, so don’t spend too much effort to reach a too-ambitious standard (although mocks, mentioned in “Unit Testing and System Testing”, can help you increase your unit-testing coverage breadth, if not depth).

You can update your v2 codebase semi-automatically, using either Python 2’s 2to3 utility or the modernize or futurize packages that build on it (the porting guide mentioned previously has good advice about choosing between the latter two, which we discuss briefly in “v2/v3 Support with a Single Source Tree”). A continuous integration environment helps you verify that you are using best-practice coding standards. Also, use tools like coverage and flake8 to roughly quantify your code quality.

Even while you are still using v2, you can start to check some of the issues that a transition to v3 involves, by using the -3 command-line option when starting the Python interpreter. This makes it warn about “Python 3.x incompatibilities that 2to3 will not trivially fix.” Also, add the -Werror flag to convert warnings to errors, thus ensuring you don’t miss them. Don’t assume, however, that this technique tells you everything you need to know: it can’t detect some subtle issues, such as data from files opened in text mode being treated as bytes.

Whenever you use automatic code conversion, review the output of the conversion process. A dynamic language like Python makes it impossible to perform a perfect translation; while testing helps, it can’t pick up all imperfections.

Readers with large codebases or a serious interest in multiversion compatibility should also read Porting to Python 3  by Lennart Regebro (available both electronically and in print). Python 2.7 had not yet appeared when the second edition of that book was produced, but its feature set was fairly well known, and the book contains much useful and detailed advice that we do not repeat here. Instead, we cover the main points where incompatibilities are likely to arise in “Minimizing Syntax Differences”, and some of the tools that can assist your conversion task.

Imports from __future__

__future__ is a standard library module containing a variety of features, documented in the online docs, to ease migration between versions. It is unlike any other module, because importing features can affect the syntax, not just the semantics, of your program. Such imports must be the initial executable statements of your code.

Each “future feature” is activated using the statement:

from __future__ import feature

where feature is the feature you want to use. In particular, we recommend placing the following line at the top of your v2 modules for best compatibility.

from __future__ import (print_function, division, 
                       absolute_import)

Under v2, it establishes compatibility with v3’s printing, division, and import mechanisms.

Minimizing Syntax Differences

There are a number of differences between Python 3.5 (referred to in this book as v3) and earlier Python 3 versions. While it is possible to accommodate these earlier versions, in the interest of brevity we have excluded such techniques. In addition, by v2 we always mean 2.7.x, not any previous versions of Python 2.

Avoid “Old-Style” Classes

While the update to the object model in Python 2.2 was largely backward compatible, there are some important, subtle areas of difference, particularly if you use multiple inheritance. To ensure that these differences don’t affect you, make sure that your v2 code uses only new-style classes.

The simplest way to do this is to have your classes explicitly inherit from object. You may want to adjust the order of other base classes, if any, to ensure that the intended methods are referenced. Most software needs little adjustment, but the more complex your multiple inheritance hierarchy, the more detailed the work.

You may choose to add __metaclass__ = type to the top of your modules to ensure “bare” classes (ones without bases) in your code are new-style, as covered in “How Python v2 Determines a Class’s Metaclass”: the statement is innocuous in v3.

String Literals

When v3 first came out, literals like b'bytestring' for bytestring objects were incompatible with v2, but this syntax has now been backported and is available in both versions. Similarly, the v2 Unicode string literal format u'Unicode string' was not available in early v3 implementations, but its inclusion in v3 was recommended in PEP 414 and is now implemented, making it easier to support both languages with a common source tree without radical changes to all string literals.

If your code’s string literals are mostly Unicode, but not flagged as such, you may use from __future__ import unicode_literals, which causes the v2 interpreter to treat all ambiguous literals as though they were prefixed with a u (just like the v3 interpreter always does).

Numeric Constants

In v2, integer literals beginning with 0 were treated as octal. This was confusing to beginners (and occasionally to more experienced programmers). Although v2 still allows this, it has become a syntax error in v3; avoid this notation altogether.

Use instead the more recent notation for octal numbers, better aligned with the notation for binary and hexadecimal numbers. In Table 26-1 the base is indicated by a letter following an initial 0 (b for binary, o for octal, x for hexadecimal).

Table 26-1. Preferred integer literal representations
Base Representations of decimal 0, 61 and 200
Binary 0b0, 0b111101, 0b11001000
Octal 0o0, 0o75, 0o310
Hexadecimal 0x0, 0x3d, 0xc8

Long integer representations in older v2 implementations required the use of a trailing L or l; for compatibility, that syntax is still accepted in v2. It is no longer necessary (and it’s a syntax error in v3); make sure that none of your code uses it.

Text and Binary Data

Perhaps the most troublesome conversion area for most projects is v3’s insistence on the difference between binary and textual data, using bytestrings for the former and Unicode strings for the latter. The bytes and bytearray types are available in v2 (bytes being an alias for str, rather than a backport). Ensure that all string literals in v2 code use version-compatible syntax, as discussed in “String Literals”, and keep the two types of data separate.

Untangling strings

Too many v2 programs don’t take care to differentiate between two very different kinds of strings: bytestrings and Unicode strings. Where text is concerned, remember to “decode on input, encode on output.” There are many different ways to represent (encode) Unicode on external devices: all encodings have to be decoded on input to convert them to Unicode, which is how you should hold all text inside your program.

Avoid using the bytes type as a function with an integer argument. In v2 this returns the integer converted to a (byte)string because bytes is an alias for str, while in v3 it returns a bytestring containing the given number of null characters. So, for example, instead of the v3 expression bytes(6), use the equivalent b'x00'*6, which seamlessly works the same way in each version.

Another consequence of the v2 alias: indexing a bytestring in v2 returns a one-byte string, while in v3 it returns the integer value of the indexed byte. This means that some comparisons are byte-to-byte in v2 and byte-to-integer in v3, which can lead to incompatible results. You should at least test your code with Python’s -bb command-line option, which warns when such comparisons are made.

Never Sort Using cmp

The cmp named argument to sorting functions and methods has been removed in v3; don’t use it in your code…it’s not a good idea in v2, either. The cmp argument had to be a function that took two values as arguments and returned either -1, 0, or +1 according to whether the first argument was less than, equal to, or greater than the second. This can make sorting slow, as the function may have to be called with each pair of items to compare in the sort, leading to a large number of calls.

The key= argument, by contrast, takes a single item to sort and returns an appropriate sort key: it is called only once for each item in the sequence to sort. Internally, the sorting algorithm creates a (key, value) tuple for each value in the input sequence, performs the sort, then strips keys off to get a sequence of just the values. The following example shows how to sort a list of strings by length (shortest first, strings of equal length in the same order as in the input):

>>> names = ['Victoria', 'Brett', 'Luciano', 'Anna', 'Alex', 'Steve']
>>> names.sort(key=len)
>>> names
['Anna', 'Alex', 'Brett', 'Steve', 'Luciano', 'Victoria']

Because Python’s sort algorithm is stable (i.e., it leaves items that compare equal in the same order in the output as in the input), names of the same length are not necessarily in alphabetical order (rather, the input order is preserved on output for strings with equal key values). This is why 'Anna' appears before 'Alex' in the output. You could change this by including the value as a secondary sort key, used to discriminate between items whose primary sort keys compare as equal:

>>> names.sort(key=lambda x: (len(x), x))
>>> names
['Alex', 'Anna', 'Brett', 'Steve', 'Luciano', 'Victoria']

Sometimes it’s faster to rely on Python’s sorting stability: sort the list twice, first alphabetically (i.e., on the minor sort key), then by length (the major sort key). Empirically, on the list in this example, we’ve measured that the “sort twice” approach is about 25% faster than the “sort once with a lambda as key” approach (remember that avoiding lambda, when feasible, often optimizes your code).

except Clauses

If an except clause needs to work with the exception value, use the modern syntax except ExceptionType as v. The older form except ExceptionType, v is a syntax error in v3, and is anyway less readable in v2.

Division

Prepare for conversion by ensuring that all integer divisions use the // (truncating division) operator. In v2, 12 / 5 is 2; in v3 the value of the same expression is 2.4. This difference could cause problems. In both versions, 12 // 5 is 2, so it’s best to convert your code to use explicit truncating division where appropriate.

For division where you actually want a floating-point result, in v2 you probably either used a floating-point integer constant in an expression like j/3.0, or applied the float function to one of the operands as in float(j)/3. You can take this opportunity to simplify your v2 code by adding from __future__ import division at the top of your modules to ensure that v2 division acts like v3 division, allowing you to simply code j/3. (Be sure to perform this simplification pass on your code only after you’ve changed all the existing / operators to // for those cases where you want truncating division.)

Incompatible Syntax to Avoid for Compatibility

One of the most irksome parts of maintaining a dual-version codebase is forcing yourself not to use some of the new features of v3. These have typically been added to the language to make it easier to use, so avoiding them inevitably makes the programming task a little more difficult.

Along with the differences described previously, other v3 features you need to avoid include: function annotations, covered in “Function Annotations and Type Hints (v3 Only)”; the nonlocal statement, covered in “Nested functions and nested scopes”; keyword-only arguments, covered in ““Keyword-only” Parameters (v3 Only)”; and metaclass= in class statements, covered in “The class Statement”. Also, list comprehensions in v3 have their own scope, unlike in v2, covered in List comprehensions and variable scope. Last but not least, anything we’ve marked throughout the book as “new in 3.6” is obviously a must-avoid in v2.

Choosing Your Support Strategy

You need to decide how you are going to use Python, depending on your specific requirements. Choice of strategy is constrained in various ways, and each strategy has upsides and downsides. You need to be pragmatic when deciding which version(s) you are going to support, and how.

One strategy that is likely to prove counterproductive is to maintain parallel source trees (such as two separate branches in the same code repository) for the two separate versions. The Python core development team maintained two separate source trees of CPython, 2.* and 3.*, for years, and that required a good deal of otherwise unnecessary work.

If your codebase is mature and changes infrequently, parallel source trees can be a viable strategy. But if there is substantial activity, you will almost certainly find the work required to maintain two branches irksome and wasteful. You also have to maintain two separate distributions, and ensure that users get the version they need. If you decide to do this, be prepared to expend extra effort. However, we recommend you choose one of the following options instead.

Steady As She Goes—v2-only Support

If you have a very large corpus of v2 code, we might heretically suggest that you simply continue to use v2. To do so, you must be ready to stay out of mainstream Python development, but this lets you keep using libraries not ported to v3 (though their number is happily dwindling). You do also avoid a large conversion task.

This strategy is clearly better suited to programs with a limited lifetime, and to environments where locally compiled interpreters are in use. If you need to run on a supported Python platform after 2020 (when long-term developer support for v2 ends), then you should plan to bite the bullet and convert to v3 well before then.

v2/v3 Support with Conversion

When developing from a single source tree, you have to decide whether that source will be in v2 or v3. Both strategies are workable, and conversion utilities exist to go either way, which lets you choose whether to maintain a v2 or a v3 codebase.

You need to use a script, unless you have very few files in your project; the complexity of that script is increased if you want to increase speed by converting only those files that have changed since your last test run. The conversions still take time, which can slow down development iteration if you decide (as you really should) to test both versions each and every time.

The tox package is useful to manage and test multiversion code. It lets you test your code under a number of different virtual environments, and supports not only v2 and v3 but also several earlier implementations, as well as Jython and PyPy.

v2 source with conversion to v3

If you have a large v2 codebase, then a good start to the migration task is rejigging your source so it can be converted automatically into v3. The Python core developers were farsighted enough to include automatic translation tools as a part of the first v3 release, and this support continues right up the the present day in the form of the 2to3 utility. 2to3 also provides a lib2to3 library containing fixers, functions that handle specific constructs requiring conversion. Other conversion utilities we discuss later in this chapter also leverage lib2to3.

The first task is to assemble your source code and prepare it for conversion to Python 3. Fortunately, nowadays, this is not difficult, as core and distribution developers have thoughtfully provided conversion routines. Installing a Python 3.5 version of Anaconda, for example, lets you (in a command-line environment where tab completion is available) enter 2to3 and hit the Tab key to reveal:

(s3cmdpy3) airhead:s3cmd sholden$ 2to3
2to3      2to3-2    2to3-2.7  2to3-3.4  2to3-3.5  2to32.6

Note, though, that not all of these come from Anaconda (the system in use had a number of Pythons on the system path), as the following directory listing shows:

airhead:s3cmd sholden$ for f in 2to3 2to3- 2to3-2 2to3-2.7 
                                2to3-3.4 2to3-3.5 2to32.6
do
 ls -l $(which $f)
done
lrwxr-xr-x  1 sholden  staff  8 Mar 14 21:45 /Users/sholden/
Projects/Anaconda/bin/2to3 -> 2to3-3.5
lrwxr-xr-x  1 sholden  admin  36 Oct 24 12:30 /usr/local/bin/
2to3-2 -> ../Cellar/python/2.7.10_2/bin/2to3-2
lrwxr-xr-x  1 sholden  admin  38 Oct 24 12:30 /usr/local/bin/
2to3-2.7 -> ../Cellar/python/2.7.10_2/bin/2to3-2.7
lrwxr-xr-x  1 sholden  admin  38 Sep 12  2015 /usr/local/bin/
2to3-3.4 -> ../Cellar/python3/3.4.3_2/bin/2to3-3.4
-rwxr-xr-x  1 sholden  staff  121 Mar 14 21:45 /Users/sholden/
Projects/Anaconda/bin/2to3-3.5
lrwxr-xr-x  1 root  wheel  73 Aug  6  2015 /usr/bin/
2to32.6 -> ../../System/Library/Frameworks/Python.framework/
Versions/2.6/bin/2to32.6

The Anaconda install supplies 2to3.5, and a symbolic link from 2to3 in the same directory; the remainder are from other installs. As long as the Python executable directory is at the beginning of your system path, the default 2to3 is the one targeting that installation: you’ll be converting to the v3 supported by it.

v3 source with conversion to v2

If you are coding in v3 but want to support v2 too, use 3to2, aka lib3to2, documented online. 3to2 uses the same fixers as lib2to3, reversing them to create a backward-compatible version of your code. 3to2 either produces working v2 code, or provides error messages describing why it can’t. Some features in v3 are unavailable in v2: differences may be subtle—test your code in both versions.

2to3 Conversion: A Case Study

The most frequent conversion use case at this time is the v2 programmer wishing to convert to v3. Here we outline the procedure for conversion, with notes from real experience. It’s not as easy as it used to be to find v2-only modules, but there are still some around. We landed on the voitto package, which was released to the world in 2012 and appears to have had little or no attention since. As with many older projects, there is no documentation about how the tests should be run.

First, we cloned the repository and created a v2 virtual environment to allow verification and testing of the downloaded package. We then installed the nose testing framework, installed voitto as an editable package, and ran nosetests to execute the package’s tests. Many older, unmaintained packages suffer from bit rot, so this is a necessary step. This involved the following commands:

$ virtualenv --python=$(which python2) ../venv2
Running virtualenv with interpreter /usr/local/bin/python2
New python executable in ↲
/Users/sholden/Projects/Python/voitto/venv2/bin/python2.7
Also creating executable in ↲
/Users/sholden/Projects/Python/voitto/venv2/bin/python
Installing setuptools, pip, wheel...done.
$ source ../venv2/bin/activate
(venv2) $ pip install nose
Collecting nose
  Using cached nose-1.3.7-py2-none-any.whl
Installing collected packages: nose
Successfully installed nose-1.3.7
(venv2) $ pip install -e .
Obtaining file:///Users/sholden/Projects/Python/voitto
Installing collected packages: voitto
  Running setup.py develop for voitto
Successfully installed voitto-0.0.1
(venv2) $ nosetests
....
----------------------------------------------------------------------
Ran 4 tests in 0.023s

OK

After verifying that the tests succeed, we created a Python3 virtual environment and ran the tests after again reinstalling dependencies. The error tracebacks have been trimmed here to retain only the significant information:

(venv2) $ deactivate
$ pyvenv ../venv3
$ source ../venv3/bin/activate
(venv3) $ pip install nose
Collecting nose
  Using cached nose-1.3.7-py3-none-any.whl
Installing collected packages: nose
Successfully installed nose-1.3.7
(venv3) airhead:voitto sholden$ pip install -e .
Obtaining file:///Users/sholden/Projects/Python/voitto
Installing collected packages: voitto
  Running setup.py develop for voitto
Successfully installed voitto
(venv3) $ nosetests
EEE
======================================================================
ERROR: Failure: NameError (name 'unicode' is not defined)
----------------------------------------------------------------------
Traceback (most recent call last):
 ...
  File "/Users/sholden/Projects/Python/voitto/tappio/lexer.py", 
  line 43, in build_set
    assert type(ch) in (str, unicode)
NameError: name 'unicode' is not defined

======================================================================
ERROR: Failure: ImportError (No module named 'StringIO')
----------------------------------------------------------------------
Traceback (most recent call last):
 ...
  File "/Users/sholden/Projects/Python/voitto/tests/script_import_test.py", 
  line 23, in <module>
    from StringIO import StringIO
ImportError: No module named 'StringIO'

======================================================================
ERROR: Failure: SyntaxError (invalid syntax (tappio_test.py, line 173))
----------------------------------------------------------------------
Traceback (most recent call last):
 ...
  File "/Users/sholden/Projects/Python/voitto/tests/tappio_test.py", 
  line 173
    print Parser(lex.lex_string(input)).parse_document()
               ^
SyntaxError: invalid syntax

----------------------------------------------------------------------
Ran 3 tests in 0.014s

FAILED (errors=3)

As you can see, the program needs some adaptation to the v3 environment: unicode is not a valid type in v3, the StringIO module has been renamed, and the print statement needs to be transformed into a function call.

Initial run of 2to3

2to3 has a number of different modes of operation. The most useful way to get a handle on the likely difficulty of a conversion is to simply run it on the project’s root directory. The command

2to3 .

scans all code ands report on suggested changes in the form of difference listings (diffs), summarizing which files likely need changing. Here is that summary:

RefactoringTool: Files that need to be modified:
RefactoringTool: ./setup.py
RefactoringTool: ./tappio/__init__.py
RefactoringTool: ./tappio/lexer.py
RefactoringTool: ./tappio/models.py
RefactoringTool: ./tappio/parser.py
RefactoringTool: ./tappio/writer.py
RefactoringTool: ./tappio/scripts/extract.py
RefactoringTool: ./tappio/scripts/graph.py
RefactoringTool: ./tappio/scripts/indent.py
RefactoringTool: ./tappio/scripts/merge.py
RefactoringTool: ./tappio/scripts/missing_accounts.py
RefactoringTool: ./tappio/scripts/move_entries.py
RefactoringTool: ./tappio/scripts/print_accounts.py
RefactoringTool: ./tappio/scripts/print_earnings.py
RefactoringTool: ./tappio/scripts/renumber.py
RefactoringTool: ./tests/helpers.py
RefactoringTool: ./tests/script_import_test.py
RefactoringTool: ./tests/tappio_test.py
RefactoringTool: ./voitto/__init__.py
RefactoringTool: ./voitto/helpers/io.py
RefactoringTool: ./voitto/helpers/tracer.py
RefactoringTool: Warnings/messages while refactoring:
RefactoringTool: ### In file ./tappio/scripts/missing_accounts.py ###
RefactoringTool: Line 36: You should use a for loop here

The diffs show that approximately 20 source files are being recommended for change. As the old saying goes, why keep a dog and bark yourself? We can have 2to3 actually make the recommended changes by adding the -w flag to the command line. Normally 2to3 creates a backup file by renaming the original file from .py to .py.bak. Operating under a version-control system renders this unnecessary, however, so we can also add the -n flag to inhibit the backups:

(venv3) $ 2to3 -wn .

Here are the changes that 2to3 makes—we don’t mention the filenames. Many of the changes are similar in nature, so we don’t provide an exhaustive commentary on each one. As detailed next, many of the changes might be suboptimal; 2to3 is much more focused on producing working code than in optimizing its performance (remember the “golden rule of programming,” covered in “Optimization”). Optimization is best treated as a separate step, once the code has been successfully converted and is passing its tests. The lines to be removed are preceded by minus signs (-), the replacement code by plus signs (+):

- assert type(ch) in (str, unicode)
+ assert type(ch) in (str, str)

v3, of course, does not define unicode, so 2to3 has converted it to str but failed to recognize that this no longer requires the tuple-membership test. Since the statement should work, however, leave this optimization for later.

- token = self.token_iterator.next()
+ token = next(self.token_iterator)

Several similar changes of this nature were recommended. In all cases the next method from v3, now renamed __next__, could simply transliterate the method call. However, 2to3 has chosen to produce better code, compatible across both versions, by using the next function instead.

- for account_number, cents in balances.iteritems():
+ for account_number, cents in balances.items():

Again several similar changes are recommended. In this case the converted code has a slightly different effect if run under v2, since the original was specifically coded to use an iterator rather than the list that (under v2) the items method produces. While this may use more memory, it shouldn’t affect the result.

- map(all_accounts.update, flat_accounts.itervalues())
+ list(map(all_accounts.update, iter(flat_accounts.values())))

This is an interesting change because the original code was rather dubious—this is the line where the migration tool advised use of a for loop instead. Under v2, map returns a list while the v3 map is an iterator. Not only has the itervalues call been changed to values (because v3’s values method produces an iterator; the iter call is redundant, though innocuous), but a list call has also been placed around the result to ensure that a list is still produced. It might appear that the application of the list function serves no useful purpose, but removing the call would mean that the all_accounts.update function would only be applied to accounts as they were consumed from the iterator.

In fact the original code is somewhat dubious, and misused the map function. It would really have been little more complex, and far more comprehensible, to write

for ac_value in iter(flat_accounts.values()):
    all_accounts.update(ac_value)

as 2to3 advised. We leave this optimization for now, though, since despite its dubious nature the code as produced still works.

- print "{account_num}: ALL  {fmt_filenames}".format(**locals())
+ print("{account_num}: ALL  {fmt_filenames}".format(**locals()))

2to3 ensures that all print statements are converted to function calls, and is good at making such changes.

- from StringIO import StringIO
+ from io import StringIO

The StringIO functionality has been moved to the io library in v3. Note that 2to3 does not necessarily produce code that is compatible across both versions; it pursues its principal purpose, conversion to v3—although in this case, as in previous ones, v2 is also happy with the converted code (since StringIO can also be accessed from io in v2, specifically in 2.7).

- assert_true(callable(main))
+ assert_true(isinstance(main, collections.Callable))

This change is a throwback because earlier v3 implementations removed the callable function. Since this was reinstated since v3.2, the conversion isn’t really necessary, but it’s harmless in both v2 and v3.

- except ParserError, e:
+ except ParserError as e:

This change also provides compatible syntax, good in both v2 and v3.

The 2to3 conversion detailed in this initial run of 2to3 does not, however, result in a working program. Running the tests still gives two errors, one in the lexer_single test and one in writer_single. Both complain about differing token lists. This is the first (we’ve inserted line breaks to avoid too-wide code boxes running off the page):

...
  File "/Users/sholden/Projects/Python/voitto/tests/tappio_test.py",
        line 54, in lexer_single
    assert_equal(expected_tokens, tokens)
AssertionError: Lists differ: [('integer', '101')] != 
                              [<Token: integer '101'>]

First differing element 0:
('integer', '101')
<Token: integer '101'>

- [('integer', '101')]
+ [<Token: integer '101'>]

The second one is similar:

...
  File "/Users/sholden/Projects/Python/voitto/tests/tappio_test.py", 
        line 188, in writer_single
    assert_equal(good_tokens, potentially_bad_tokens)
AssertionError: Lists differ:
  [<Tok[1169 chars]ce_close ''>, <Token: brace_close ''>, 
   <Token: brace_close ''>] != [<Tok[1169 chars]ce_close ''>, 
   <Token: brace_close ''>, <Token: brace_close ''>]

First differing element 0:
<Token: brace_open ''>
<Token: brace_open ''>

Diff is 1384 characters long. Set self.maxDiff to None to see it.

These were both traced to the token list comparison routine itself, which the original code provides in the form of a __cmp__ method for the Token class, reading as follows:

def __cmp__(self, other):
    if isinstance(other, Token):
        return cmp((self.token_type, self.value), 
                  (other.token_type, other.value))
    elif (isinstance(other, list) or 
          isinstance(other, tuple)) and len(other) == 2:
        return cmp((self.token_type, self.value), other)
    else:
        raise TypeError('cannot compare Token to {!r}'.format(
                                                       type(other)))

This method is not used in v3, which relies on so-called rich comparison methods. Since no __eq__ is found, equality checks falls back to those inherited from object, which unfortunately don’t measure up. The fix is to replace the __cmp__ method with an __eq__ method. This suffices to implement the equality checks required in the tests; the new code is:

def __eq__(self, other):
    if isinstance(other, Token):
        return (self.token_type, self.value) == 
               (other.token_type, other.value)
    elif (isinstance(other, list) or 
          isinstance(other, tuple)) and len(other) == 2:
        return (self.token_type, self.value) == tuple(other)
    else:
        raise TypeError('cannot compare Token to {!r}'.format(
                                                       type(other)))

This final change is all that is required to have all tests pass under v3:

(ve35) airhead:voitto sholden$ nosetests
....
----------------------------------------------------------------------
Ran 4 tests in 0.028s

OK

Retaining v2 compatibility

While we now have working v3 code, running the tests under v2 shows that we have broken backward compatibility, and the writer_test now fails:

File "/Users/sholden/Projects/Python/voitto/tappio/writer.py", 
        line 48, in write
    self.stream.write(str(token))
TypeError: unicode argument expected, got 'str'

It would be unfair to complain about the fact the 2to3 conversion breaks compatibility, since maintaining compatibility is no part of its purview. Happily, it is relatively easy to take this v3 code and make it also compatible with v2. The process does, for the first time, require writing some code that introspects to determine the particular version of Python. We discuss version compatibility further in “v2/v3 Support with a Single Source Tree”.

The issue is that the original code does not explicitly flag string literals as Unicode, so under v2 they are treated as being of type str rather than the required unicode type. It would be tedious to make this conversion manually, but there’s a solution in the __future__ module: the unicode_literals import. When this import is present in a v2 program, string literals are implicitly treated as Unicode strings, which is exactly the change we want. We therefore added the line

from __future__ import unicode_literals

as the first executable line in each module.

Note that this is not always the panacea it might appear to be (and it certainly doesn’t offer an immediate fix here, as we’ll shortly see). Although it’s helpful in this case, when programs have mixed binary and textual string data there is no one representation to serve both purposes, and many developers prefer to take things gradually. The changes can introduce regressions in Python 2, which have to be addressed. Not only that, but all docstrings implicitly then become Unicode, which prior to 2.7.7 would have caused syntax errors.

This change alone actually increases the number of failing tests to three, and the error is the same in all cases:

  File "/Users/sholden/Projects/Python/voitto/tappio/lexer.py", 
        line 45, in build_set
    assert type(ch) in (str, str)

You can see that the change that 2to3 made to remove the unicode type has come back to bite us because, while it suffices in v3, under v2 we need to explicitly allow strings of type unicode. We can’t just revert that change, since that type isn’t available under v3: attempts to reference it raise a NameError exception.

What we need is a type that is compatible with both versions. The six module (covered later) does introduce such a type, but, rather than carry the overhead of the entire module, in this case we chose to write a simple compatibility module at tappio/compat.py that creates an appropriate definition of the name unicode for the version it’s running on. Following the advice of the porting guide, we use feature detection rather than querying version information. compat.py reads as follows:

try:
    unicode = unicode
except NameError:
    unicode = str

The alert reader might ask themselves why it is necessary to bind the unicode name rather than simply referencing it. The reason is that in v2 the unicode on the RHS comes from the built-in namespace, and so, without the assignment, it would not be available for import from the module.

The modules tappio/writer.py and tappio/lexer.py were modified to import unicode from tappio.compat. In writer.py, str(token) was changed to unicode(token). In lexer.py, isinstance(ch, str) was reverted to isinstance(ch, unicode). After these changes, all tests pass under both v2 and v3.

v2/v3 Support with a Single Source Tree

As the preceding exercise demonstrates, you can write Python that executes correctly in both environments. But this means taking some care when writing code, and you usually need to add some code specifically to provide compatibility between the two environments. Python’s official Porting HOWTO document includes practical advice on interversion compatibility with a rather broader focus than ours.

A number of libraries have been produced to support single-source development, including six, python-modernize, and python-future. The last two each allow you to fix v2 code and add appropriate imports from __future__ to get necessary bits and pieces of v3 functionality.

six

The six library was the earliest project to provide such compatibility adaptation, but some people feel that it is too bulky, and convenience can also be an issue. It has been, however, successfully used in large projects such as Django to provide version compatibility, so you can trust that the library is capable of doing all you need.

six is particularly valuable when you need to retain compatibility with earlier v2 implementations, since it was produced when 2.6 was current and therefore also takes into account the extra issues that 2.6 compatibility raises. However, we recommend you target only Python 2.7, not earlier versions, in such compatibility quests: the task is hard enough without having to make it even harder.

python-modernize

The python-modernize library appeared when Python 3.3 was current. This library provides good compatibility between 2.6, 2.7, and 3.3 by providing a thin wrapper around 2to3, and is used in much the same way. The author, Armin Ronacher, describes its use, and gives porting advice useful to anyone considering dual-version support, in Porting to Python Redux.

python-modernize uses six as a base and operates much like 2to3, reading your code and writing out a version that’s easier to maintain as a dual-source codebase. Install it in a virtual environment with pip install modernize. Its output tends to move code stylistically toward v3.

python-future

The most recent tool for single-source support is the Python-Future library: install it in a virtual environment with pip install future. If you (as we recommend) support only v2 (Python 2.7) and v3 (Python 3.5 and newer), it stands alone, without external dependencies. With the addition of importlib, unittest2, and argparse, it can extend support to 2.6, should you absolutely require that.

Translation happens in two stages: the first stage uses only “safe” fixers that are extremely unlikely to break your code, and does not attempt to use any compatibility functions from the future package. The output from stage one is still v2 code with calls to v2 standard functions; the purpose of the two-stage approach is to remove uncontroversial changes, so that the second stage can focus on those more likely to cause breakage.

The second-stage changes add dependencies on the future package, and converts the source to v3 code that also runs on v2 by using the dependencies. Existing users of v2 might find that the converted output style is more like the code they are used to.

future also comes with a pasteurize command that claims to convert v3 code into v2 code. We found, however, that its default mode was unable to duplicate the v2 compatibility changes we discussed at the end of “Retaining v2 compatibility”.

v3-only Support

For anyone creating new Python projects with no dependencies on v2-only libraries, the preferred strategy is: write in v3, and never look back! Of course, if you produce libraries or frameworks meant for use under both versions, then you don’t have this luxury: you need to provide the compatibility your consumers require.

Bear in mind the limited lifetime remaining for v2: the v3-only strategy is becoming more viable by the month. More and more v3 code is being produced daily, and over time the new code will tend to dominate. You can also expect more projects to drop v2 support, as many significant scientific projects indicate in the Python 3 Statement. So, if you choose to produce only v3 code, the number of complaints is more likely to go down than up over time. Stick to your guns, and reduce your maintenance load!

1 As further encouragement, Python 2.6 is deprecated on PyPI and pip.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset