Skip to content Skip to sidebar Skip to footer

Why Is Updating A List Faster When Using A List Comprehension As Opposed To A Generator Expression?

According to this answer lists perform better than generators in a number of cases, for example when used together with str.join (since the algorithm needs to pass over the data tw

Solution 1:

This answer concerns CPython implementation only. Using a list comprehension is faster, since the generator is first converted into a list anyway. This is done because the length of the sequence should be determined before proceeding to replace data, and a generator can't tell you its length.

For list slice assignment, this operation is handled by the amusingly named list_ass_slice. There is a special-case handling for assigning a list or tuple, here - they can use PySequence_Fast ops.

This is the v3.7.4 implementation of PySequence_Fast, where you can clearly see a type-check for list or tuples:

PyObject *
PySequence_Fast(PyObject *v, constchar *m){
    PyObject *it;

    if (v == NULL) {
        returnnull_error();
    }

    if (PyList_CheckExact(v) || PyTuple_CheckExact(v)) {
        Py_INCREF(v);
        return v;
    }

    it = PyObject_GetIter(v);
    if (it == NULL) {
        if (PyErr_ExceptionMatches(PyExc_TypeError))
            PyErr_SetString(PyExc_TypeError, m);
        returnNULL;
    }

    v = PySequence_List(it);
    Py_DECREF(it);

    return v;
}

A generator expression will fail this type check and continue to the fallback code, where it is converted into a list object, so that the length can be predetermined.

In the general case, a predetermined length is desirable in order to allow efficient allocation of list storage, and also to provide useful error messages with extended slice assignment:

>>>vals = (x for x in'abc')>>>L = [1,2,3]>>>L[::2] = vals  # attempt assigning 3 values into 2 positions
---------------------------------------------------------------------------
                                          Traceback (most recent call last)
...
ValueError: attempt to assign sequence of size 3 to extended slice of size 2
>>>L  # data unchanged
[1, 2, 3]
>>>list(vals)  # generator was fully consumed
[]

Post a Comment for "Why Is Updating A List Faster When Using A List Comprehension As Opposed To A Generator Expression?"