Professional Documents
Culture Documents
Py3K RST
Py3K RST
-*-rst-*-
*********************************************
Developer notes on the transition to Python 3
*********************************************
:date: 2010-07-11
:author: Charles R. Harris
:author: Pauli Virtanen
General
=======
Some glitches may still be present; however, we are not aware of any
significant ones, the test suite passes.
Resources
---------
- https://wiki.python.org/moin/cporting
- https://wiki.python.org/moin/PortingExtensionModulesToPy3k
Prerequisites
-------------
The Nose test framework has currently (Nov 2009) no released Python 3
compatible version. Its 3K SVN branch, however, works quite well:
- http://python-nose.googlecode.com/svn/branches/py3k
As a side effect, the Py3 adaptation has caused the following semantic
changes that are visible on Py2.
* Objects (except bytes and str) that implement the PEP 3118 array interface
will behave as ndarrays in `array(...)` and `asarray(...)`; the same way
as if they had ``__array_interface__`` defined.
* :pep:`3118` buffer objects will behave differently from Py2 buffer objects
when used as an argument to `array(...)`, `asarray(...)`.
Python code
===========
2to3 in setup.py
----------------
build/py3k
Not all of the 2to3 transformations are appropriate for all files.
Especially, 2to3 seems to be quite trigger-happy in replacing e.g.
``unicode`` by ``str`` which causes problems in ``defchararray.py``.
For files that need special handling, add entries to
``tools/py3tool.py``.
numpy.compat.py3k
-----------------
At many points in NumPy, bytes literals are needed. These can be created via
numpy.compat.asbytes and asbytes_nested.
Exception syntax
----------------
Relative imports
----------------
Print
-----
types module
------------
numpy.core.numerictypes
-----------------------
=========== ============
Scalar type Value
=========== ============
str_ This is the basic Unicode string type on Py3
bytes_ This is the basic Byte-string type on Py3
string_ bytes_ alias
unicode_ str_ alias
=========== ============
numpy.loadtxt et al
-------------------
I assumed they are meant for reading Bytes streams -- this is probably
the far more common use case with scientific data.
Cyclic imports
--------------
C Code
======
NPY_PY3K
--------
.. todo::
private/npy_3kcompat.h
----------------------
Any new ones that need to be added should be added in this file.
.. todo::
ob_type, ob_size
----------------
These use Py_SIZE, etc. macros now. The macros are also defined in
npy_3kcompat.h for the Python versions that don't have them natively.
Py_TPFLAGS_CHECKTYPES
---------------------
PyNumberMethods
---------------
- number.c
- scalartypes.c.src
- scalarmathmodule.c.src
PyBuffer (provider)
-------------------
Py3 introduces the PEP 3118 buffer protocol as the *only* protocol,
so we must implement it.
The exporter parts of the PEP 3118 buffer protocol are currently
implemented in ``buffer.c`` for arrays, and in ``scalartypes.c.src``
for generic array scalars. The generic array scalar exporter, however,
doesn't currently produce format strings, which needs to be fixed.
- VOID_getitem
.. todo::
.. todo::
PyBuffer (consumer)
-------------------
There are two places in which we may want to be able to consume buffer
objects and cast them to ndarrays:
We do, however, want to allow other objects that provide 1-D byte arrays
to be cast to 1-D ndarrays and not 'S#' arrays -- for instance, 'S#'
arrays tend to strip trailing NUL characters.
array([some_3118_object])
will treat the object similarly as it would handle an `ndarray`.
However, again, bytes (and unicode) have priority and will not be
handled as buffer objects.
.. todo::
.. todo::
PyBuffer (object)
-----------------
PyString
--------
This entry discusses return values etc. only, the 'S' dtype is a
separate topic.
At some places in NumPy code, there are some guards for Unicode field
names. However, the dtype constructor accepts only strings as field names,
so we should assume field names are *always* UString.
.. todo::
.. todo::
.. todo::
.. todo::
.. todo::
PyUnicode
---------
In Py3, Unicode and Bytes are not comparable, ie., 'a' != b'a'. NumPy
comparison routines were handled to act in the same way, leaving
comparison between Unicode and Bytes undefined.
.. todo::
However,::
PyInt
-----
.. todo::
Not inheriting from `int` on Python 3 makes the following not work:
``np.intp("0xff", 16)`` -- because the NumPy type does not take
the second argument. This could perhaps be fixed...
Divide
------
tp_compare, PyObject_Compare
----------------------------
* flagsobject.c
Pickling
--------
loads(f, encoding='latin1')
.. todo::
.. todo::
Module initialization
---------------------
PyTypeObject
------------
The PyTypeObject of py3k is binary compatible with the py2k version and the
old initializers should work. However, there are several considerations to
keep in mind.
1) Because the first three slots are now part of a struct some compilers issue
warnings if they are initialized in the old way.
2) The compare slot has been made reserved in order to preserve binary
compatibility while the tp_compare function went away. The tp_richcompare
function has replaced it and we need to use that slot instead. This will
likely require modifications in the searchsorted functions and generic sorts
that currently use the compare function.
PySequenceMethods
-----------------
* multiarray/descriptor.c
* multiarray/scalartypes.c.src
* multiarray/arrayobject.c
PySequenceMethods in py3k are binary compatible with py2k, but some of the
slots have gone away. I suspect this means some functions need redefining so
the semantics of the slots needs to be checked::
PySequenceMethods foo_sequence_methods = {
(lenfunc)0, /* sq_length */
(binaryfunc)0, /* sq_concat */
(ssizeargfunc)0, /* sq_repeat */
(ssizeargfunc)0, /* sq_item */
(void *)0, /* nee sq_slice */
(ssizeobjargproc)0, /* sq_ass_item */
(void *)0, /* nee sq_ass_slice */
(objobjproc)0, /* sq_contains */
(binaryfunc)0, /* sq_inplace_concat */
(ssizeargfunc)0 /* sq_inplace_repeat */
};
PyMappingMethods
----------------
* multiarray/descriptor.c
* multiarray/iterators.c
* multiarray/scalartypes.c.src
* multiarray/flagsobject.c
* multiarray/arrayobject.c
PyMappingMethods in py3k look to be the same as in py2k. The semantics
of the slots needs to be checked::
PyMappingMethods foo_mapping_methods = {
(lenfunc)0, /* mp_length */
(binaryfunc)0, /* mp_subscript */
(objobjargproc)0 /* mp_ass_subscript */
};
PyFile
------
1) PyFile_Type
2) PyFile_AsFile
3) PyFile_FromString
.. todo::
Adapt all NumPy I/O to use the PyFile_* methods or the low-level
IO routines. In any case, it's unlikely that C stdio can be used any more.
READONLY
--------
PyOS
----
Deprecations:
PyInstance
----------
There are some checks for PyInstance in ``common.c`` and ``ctors.c``.
.. todo::
PyCObject / PyCapsule
---------------------
NumPy was changed to use the Capsule API, using NpyCapsule* wrappers.