Recently, the DMOJ judge codebase has been migrated to Python 3, thanks to the combined efforts of me, Xyene, and kiritofeng. Many issues, such as unicode handling were exposed in the process.

Since Python 2 is still in heavy use, at least in the deployment of the DMOJ judge, compatibility with it must be maintained. This necessitated writing code in such a fashion as to be compatible with both Python 2 and Python 3. The six library has proved tremendously helpful in abstracting away the differences, some of which highly non-trivial.

For example, six.with_metaclass hides away the difference in metaclass use. In Python 2, the __metaclass__ class member defines the metaclass used for the class, while in Python 3, one would specify it as class Class(metaclass=MetaClass). The latter would be a syntax error in Python 2, and the former has no effect in Python 3. six provides a solution that is highly non-obvious and yet works perfectly.

The most frustrating part is unicode-handling. The DMOJ judge was written somewhat sloppily in regards to unicode handling, dealing mostly with bytestrings and raw bytes. With the separation of bytes and str in Python 3, strings in the judge must be turned into either bytes or str on a case-by-case basis. It is decided that source code and program output will be treated as raw bytes, and textual data that are derived from these will be handled as UTF-8.

Naturally, this leads to the fascinating question of what to do in case the input does not decode as valid UTF-8. For most part, these will be allowed to cause an exception and interrupt grading so they can be discovered loudly and clearly. However, in some cases, like the error reporting mechanism, we choose to use U+FFFD REPLACEMENT CHARACTER to substitute for the errors.

We introduced utf8text and utf8bytes to turn either UTF-8 encoded bytes or unicode text into Python 3 str and bytes, respectively. This allows most functions that could potentially receive both Python 2 str and unicode to simply call one of the functions to convert the input into the desired form.

Python 2’s acceptance of the b prefix for bytestrings and Python 3’s acceptance of u for unicode strings have greatly eased the migration process.

Another issue worth noting is that regular expressions now have two varieties, one for bytes and one for str. The judge used regex to parse things like the package name from Java source files, and these regular expressions must be converted into bytes. The most frustrating part is that the DMOJ access control system relies on regular expressions heavily to check file access, and scattered throughout over four dozen files are textual regular expressions. This raises the potential to cause errors while attempting to check access and allow programs to potentially crash the sandbox. A decision was made to simply kill programs if they accessed paths that are not valid UTF-8. This is a stop-gap measure, but is not expected to cause problems since most programs only deal in ASCII paths.

Since using regular expressions requires raw strings for readable code, and some regular expressions must handle bytes, we are forced to use raw bytes. One would imagine either rb or br works for such a purpose, but we were shocked to discover that only Python 3.3+ is able to accept rb. To maintain compatibility with Python 2, br must be used.

Python 3.0 to 3.4 also prohibited the use of % formatting on bytes, making it very difficult to generate non-trivial bytestrings without resorting to string concatentation or b''.join, both of which are undesirable and necessitated the rewrite of some code.

Thanks to our rather robust CI systems for the judge, most of the compatibility issues with Python 3 has been caught and corrected. The code now passes all automated tests on Python 3. This greatly reduced the amount of work needed to be done by humans. However, it is not a guarantee the code is bug-free. Similarly, the CI systems have also made sure that no regression occurs on Python 2.

Currently, the judge code is almost completely compatible with Python 3. The only thing remaining is deploying a Python 3 judge to handle real-world tasks, but an obstacle still remains: DMOJ allows custom graders and checkers to be supplied with the problem, and all those must be made compatible with Python 3. This cannot be done with the CI system, and will require many days of work.