Home:ALL Converter>Pandas cumsum with conditional product of lagged value?

Pandas cumsum with conditional product of lagged value?

Ask Time:2014-09-06T22:54:41         Author:bazel

Json Formatter

I'm trying to get a cumulative sum that changes according to the product of another variable and the lagged value of the sum (sounds a bit like math gibberish, I know.. please bear with me)

Here's the example setup:

import pandas as pd
df = pd.DataFrame([1,1,1.004878,1,1.043394],columns=['xx'])
df['n'] = 1000000.0

Which assembles to:

       xx        n
0  1.000000  1000000
1  1.000000  1000000
2  1.004878  1000000
3  1.000000  1000000
4  1.043394  1000000

Now, we need to multiply the xx by the lagged value of n, iteratively, and then take the cumulative sum of this value:

cs = pd.Series([0.0] * len(df))
cs[0] = df.ix[0]['n']
for i,e in enumerate(df.iterrows()):
    if i == 0: continue
    cs[i] = df.ix[i]['xx'] * cs[(i - 1)]

This produces the following:

0    1000000.000000
1    1000000.000000
2    1004878.000000
3    1004878.000000
4    1048483.675932
dtype: float64

Question: Is there a way to do this in pandas/numpy that doesn't require iterating over each row? If not, are there any tricks of the trade for optimising code like the above, when you are forced to iterate? Can an creatively crafted index help in this case? Performance is a concern with 10000+ rows, across multiple datasets.

Author:bazel,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/25701494/pandas-cumsum-with-conditional-product-of-lagged-value
wwii :

First, your for loop can be simplified to:\n\nfor i in xrange(1, len(df)):\n cs[i] = df.ix[i]['xx'] * cs[(i - 1)]\n\n\n(more math gibberish)\nEach item in cs[1:] is the product of all previous items in df['xx'] (the cumulative product) multiplied by the first item in the n column of df\n\n>>> df\n xx n\n0 1.000000 1000000\n1 1.000000 1000000\n2 1.004878 1000000\n3 1.000000 1000000\n4 1.043394 1000000\n>>> a = df['xx']\n>>> a\n0 1.000000\n1 1.000000\n2 1.004878\n3 1.000000\n4 1.043394\nName: xx, dtype: float64\n>>> a = a.cumprod()\n>>> a\n0 1.000000\n1 1.000000\n2 1.004878\n3 1.004878\n4 1.048484\nName: xx, dtype: float64\n>>> a = a * df['n'][0]\n>>> a\n0 1000000.000000\n1 1000000.000000\n2 1004878.000000\n3 1004878.000000\n4 1048483.675932\nName: xx, dtype: float64\n>>> np.all(a == cs)\nTrue\n>>> \n\na = df['xx'].cumprod() * df['n'][0]\n\n\nThis is not a trick.\nThis only works because df['xx'][0] is 1. If it was any other value, AND cs[0] = df.ix[0]['n'] was not just a shortcut then cumprod wouldn't work.\n\nExpanding each item of cs gives\n\ncs[0] = df['n'][0]\ncs[1] = df['xx'][1] * df['n'][0]\ncs[2] = df['xx'][2] * df['xx'][1] * df['n'][0]\ncs[3] = df['xx'][3] * df['xx'][2] * df['xx'][1] * df['n'][0]\ncs[4] = df['xx'][4] * df['xx'][3] * df['xx'][2] * df['xx'][1] * df['n'][0]\n\n\nSince df['xx'][0] equals one and df['xx'][0] * df['n'][0] == df['n'][0] then:\n\ncs[0] = df['xx'][0] * df['n'][0]\ncs[1] = df['xx'][1] * df['xx'][0] * df['n'][0]\ncs[2] = df['xx'][2] * df['xx'][1] * df['xx'][0] * df['n'][0]\ncs[3] = df['xx'][3] * df['xx'][2] * df['xx'][1] * df['xx'][0] * df['n'][0]\ncs[4] = df['xx'][4] * df['xx'][3] * df['xx'][2] * df['xx'][1] * df['xx'][0] * df['n'][0]\n\n\nIf You were to change the problem conditions slightly where after each iteration I need to subtract 0.05% of the last computed value of n (before the next iteration), does cumprod still work?\n\nIf you did the item expansion exercise you should have seen that the new condition results in multiplication by the cumulative product of an array of the scaling factor. Two ways to go about it - both result in some minor floating point errors from the calc performed in a loop. Again, you need to account for the first item in df['xx'] being one.\n\nfor i in xrange(1, len(df)):\n cs[i] = df.ix[i]['xx'] * (.9995 * cs[(i - 1)])\n\n>>> k\narray([ 1. , 0.9995, 0.9995, 0.9995, 0.9995])\n>>> z = df['xx'] * k\n>>> z\n0 1.000000\n1 0.999500\n2 1.004376\n3 0.999500\n4 1.042872\nName: xx, dtype: float64\n>>> z = z.cumprod() * df['n'][0]\n>>> cs - z\n0 0.000000e+00\n1 0.000000e+00\n2 0.000000e+00\n3 0.000000e+00\n4 -1.164153e-10\ndtype: float64\n>>> \n>>> z = df['xx'].cumprod() * df['n'][0]\n>>> z *= k.cumprod()\n>>> cs - z\n0 0.000000e+00\n1 0.000000e+00\n2 -1.164153e-10\n3 0.000000e+00\n4 0.000000e+00\ndtype: float64\n>>> \n",
2014-09-06T16:11:14
yy