Lexington is designed to support Python 2.6, 2.7, 3.2, 3.3, and any future versions of Python 3, using the same codebase. (It’s still too early to make any judgements about Python 4.)
Because Lexington deals heavily with text and bytes, its strategy is to use:
from __future__ import unicode_literals
This can cause problems on Python 2, in situations where the interpreter internals expect binary data, and turn unicode into ASCII (or just explode on it). However, it ensures that our strings have consistent semantics across Python versions.
This constant is True if we are running on Python 3, False if we are running on Python 2.
The type used for Unicode text data on the current platform.
The type used for individual Unicode code points on the current platform (that is, what you get when you iterate over Text).
The type used for binary data on the current platform.
The type used for individual bytes on the current platform (that is, what you get when you iterate over Bytestring).
A tuple of the types that indicate strings.
A tuple of the types that indicate characters (that is, anything you can get iterating over a class in Strings).
Returns the Python type of a string with string as a symbol or substring. (This can’t just use type, because it has to take into account that on Python 3, the “symbols” for a bytestring are int.)
| Parameters: | string – A string or symbol. |
|---|---|
| Raises TypeError: | |
| If a non-string, non-symbol character is provided. | |
Warning
These are intended primarily for internal use, and as such are subject to change more frequently than the rest of Lexington’s API. Also, these are designed for code that uses from __future__ import unicode_literals.
Ensures that a Unicode string is in the current Python’s “native format” – the format the interpreter expects strings to be in. On Python 3, this is Unicode. On Python 2, this is a UTF-8 encoded bytestring.
(This function is only intended for code paths where Python 2’s default behavior of encoding as ASCII could potentially cause problems. If the string is guaranteed to be ASCII, and isn’t passing through a code path that will only work with str, you don’t need this.)
| Parameters: | string – A Unicode string that needs to pass through the interpreter internals. |
|---|