Pandas Rolling Window To Return An Array

June 29, 2023 Post a Comment

Here is a sample code. df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) df['C'] = df.B.rolling(window=3) Output: A B

Solution 1:

Since pandas 1.1 rolling objects are iterable, so you can just use the list constructor:

df['C'] = list(df.B.rolling(window=3))

OR if you want to have your windows as lists instead of Series's do:

df['C'] = [window.to_list() forwindowin df.B.rolling(window=3)]

This is short and you are able to use all the handy parameters of the rolling function.

Solution 2:

You could use np.stride_tricks:

import numpy as np
as_strided = np.lib.stride_tricks.as_strided  

df

          A         B
0 -0.272824 -1.606357
1 -0.350643  0.000510
2  0.247222  1.627117
3 -1.601180  0.550903
4  0.803039 -1.231291
5 -0.536713 -0.313384
6 -0.840931 -0.675352
7 -0.930186 -0.189356
8  0.151349  0.522533
9 -0.046146  0.507406

win = 3  # window size# https://stackoverflow.com/a/47483615/4909087
v = as_strided(df.B, (len(df) - (win - 1), win), (df.B.values.strides * 2))

v
array([[ -1.60635669e+00,   5.10129842e-04,   1.62711678e+00],
       [  5.10129842e-04,   1.62711678e+00,   5.50902812e-01],
       [  1.62711678e+00,   5.50902812e-01,  -1.23129111e+00],
       [  5.50902812e-01,  -1.23129111e+00,  -3.13383794e-01],
       [ -1.23129111e+00,  -3.13383794e-01,  -6.75352179e-01],
       [ -3.13383794e-01,  -6.75352179e-01,  -1.89356194e-01],
       [ -6.75352179e-01,  -1.89356194e-01,   5.22532550e-01],
       [ -1.89356194e-01,   5.22532550e-01,   5.07405549e-01]])

df['C'] = pd.Series(v.tolist(), index=df.index[win - 1:])
df

          A         B                                                  C
0 -0.272824 -1.606357                                                NaN
1 -0.350643  0.000510                                                NaN
2  0.247222  1.627117  [-1.606356691642917, 0.0005101298424200881, 1....
3 -1.601180  0.550903  [0.0005101298424200881, 1.6271167809032248, 0....
4  0.803039 -1.231291  [1.6271167809032248, 0.5509028122535129, -1.23...
5 -0.536713 -0.313384  [0.5509028122535129, -1.2312911105674484, -0.3...
6 -0.840931 -0.675352  [-1.2312911105674484, -0.3133837943758246, -0....
7 -0.930186 -0.189356  [-0.3133837943758246, -0.6753521794378446, -0....
8  0.151349  0.522533  [-0.6753521794378446, -0.18935619377656243, 0....
9 -0.046146  0.507406  [-0.18935619377656243, 0.52253255045267, 0.507...

Solution 3:

Let's using this pandas approach with a rolling apply trick:

df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
list_of_values = []
df.B.rolling(3).apply(lambda x: list_of_values.append(x.values) or 0, raw=False)
df.loc[2:,'C'] = pd.Series(list_of_values).values
df

Output:

AB                                                                  C
01.6100850.354823                                                                NaN
1 -0.241446 -0.304952                                                                NaN
20.524812 -0.240972[0.35482336179318674, -0.30495156795594963, -0.24097191924555197]30.7673540.281625[-0.30495156795594963, -0.24097191924555197, 0.2816249674055174]4 -0.349844 -0.533781[-0.24097191924555197, 0.2816249674055174, -0.5337811449574766]5 -0.1741890.133795[0.2816249674055174, -0.5337811449574766, 0.13379518286397707]62.799437 -0.978349[-0.5337811449574766, 0.13379518286397707, -0.9783488211443795]70.2501290.289782[0.13379518286397707, -0.9783488211443795, 0.2897823417165459]8 -0.385259 -0.286399[-0.9783488211443795, 0.2897823417165459, -0.28639931887491943]9 -0.755363 -1.010891[0.2897823417165459, -0.28639931887491943, -1.0108913605575793]

Solution 4:

Perhaps zipping would also help in your case i.e

defget_list(x,m) : returnlist(zip(*(x[i:] for i inrange(m))))

# get_list(df['B'],3) would return 

[(-1.606357, 0.0005099999999999999, 1.627117),
 (0.0005099999999999999, 1.627117, 0.5509029999999999),
 (1.627117, 0.5509029999999999, -1.231291),
 (0.5509029999999999, -1.231291, -0.313384),
 (-1.231291, -0.313384, -0.6753520000000001),
 (-0.313384, -0.6753520000000001, -0.189356),
 (-0.6753520000000001, -0.189356, 0.522533),
 (-0.189356, 0.522533, 0.507406)]

df['C'] = pd.Series(get_list(df['B'],3), index=df.index[3 - 1:])
# Little help form @coldspeedprint(df)

          A         B                                                  C
0 -0.272824 -1.606357                                                NaN
1 -0.3506430.000510                                                NaN
20.2472221.627117       (-1.606357, 0.0005099999999999999, 1.627117)
3 -1.6011800.550903  (0.0005099999999999999, 1.627117, 0.5509029999...
40.803039 -1.231291          (1.627117, 0.5509029999999999, -1.231291)
5 -0.536713 -0.313384         (0.5509029999999999, -1.231291, -0.313384)
6 -0.840931 -0.675352        (-1.231291, -0.313384, -0.6753520000000001)
7 -0.930186 -0.189356        (-0.313384, -0.6753520000000001, -0.189356)
80.1513490.522533         (-0.6753520000000001, -0.189356, 0.522533)
9 -0.0461460.507406                    (-0.189356, 0.522533, 0.507406)

Solution 5:

In newer numpy versions there is a sliding_window_view().

It provides identical to as_strided() arrays, but with more transparent syntax.

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9])
sliding_window_view(x, 3)

>>>
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

But be aware that pandas rolling will add few nans (window_size - 1) at the start because it uses padding. You can check it like this:

x.rolling(3).sum()

>>>
0     NaN
1     NaN
2     6.0
3     9.0
4    12.0
5    15.0
6    18.0
7    21.0
8    24.0
dtype: float64

sliding_window_view(x, 3).sum(axis=1)
>>>
array([ 6,  9, 12, 15, 18, 21, 24])

So real corresponding array should be:

c = np.array([[nan, nan,  1.],
              [nan,  1.,  2.],
              [ 1.,  2.,  3.],
              [ 2.,  3.,  4.],
              [ 3.,  4.,  5.],
              [ 4.,  5.,  6.],
              [ 5.,  6.,  7.],
              [ 6.,  7.,  8.],
              [ 7.,  8.,  9.]])

c.sum(axis=1)
>>>
array([nan, nan,  6.,  9., 12., 15., 18., 21., 24.])

Python Manual