Cross-Python String Support

Lexington is designed to support Python 2.6, 2.7, 3.2, 3.3, and any future versions of Python 3, using the same codebase. (It’s still too early to make any judgements about Python 4.)

Because Lexington deals heavily with text and bytes, its strategy is to use:

from __future__ import unicode_literals

This can cause problems on Python 2, in situations where the interpreter internals expect binary data, and turn unicode into ASCII (or just explode on it). However, it ensures that our strings have consistent semantics across Python versions.

lexington.strings.PYTHON_3000

This constant is True if we are running on Python 3, False if we are running on Python 2.

Analyzing String Types

lexington.strings.Text

The type used for Unicode text data on the current platform.

lexington.strings.Codepoint

The type used for individual Unicode code points on the current platform (that is, what you get when you iterate over Text).

lexington.strings.Bytestring

The type used for binary data on the current platform.

lexington.strings.Byte

The type used for individual bytes on the current platform (that is, what you get when you iterate over Bytestring).

lexington.strings.Strings

A tuple of the types that indicate strings.

lexington.strings.Characters

A tuple of the types that indicate characters (that is, anything you can get iterating over a class in Strings).

lexington.strings.string_type(string)[source]

Returns the Python type of a string with string as a symbol or substring. (This can’t just use type, because it has to take into account that on Python 3, the “symbols” for a bytestring are int.)

Parameters:string – A string or symbol.
Raises TypeError:
 If a non-string, non-symbol character is provided.

String Helpers

Warning

These are intended primarily for internal use, and as such are subject to change more frequently than the rest of Lexington’s API. Also, these are designed for code that uses from __future__ import unicode_literals.

lexington.strings.n(string)[source]

Ensures that a Unicode string is in the current Python’s “native format” – the format the interpreter expects strings to be in. On Python 3, this is Unicode. On Python 2, this is a UTF-8 encoded bytestring.

(This function is only intended for code paths where Python 2’s default behavior of encoding as ASCII could potentially cause problems. If the string is guaranteed to be ASCII, and isn’t passing through a code path that will only work with str, you don’t need this.)

Parameters:string – A Unicode string that needs to pass through the interpreter internals.
lexington.strings.native_strings(fn)[source]

A decorator for functions that need to return a “native string” (see n). Note that on Python 2, it will only encode the result when the function’s return value is actually a string.

Project Versions

Table Of Contents

Previous topic

Regular Expressions

This Page