Home:ALL Converter>Creating a lagged variable subject to some condition in a pandas dataframe

Creating a lagged variable subject to some condition in a pandas dataframe

Ask Time:2021-05-18T01:07:30         Author:HeinrichCode

Json Formatter

i am attempting to create a lagged feature as part of my dataframe.

I have done this using the shift() function and my dataframe has the following updated format:


df1[515:525]
Out[20]: 
        store_id  units_sold  Data_lagged
144378      8091          45         18.0
145533      8091          34         45.0
146688      8091          20         34.0
147843      8091          27         20.0
148998      8091          38         27.0
32          8094          33         38.0  **** <- needs to be 0
1187        8094          36         33.0
2342        8094          37         36.0
3497        8094          33         37.0
4652        8094          37         33.0

The 'Data_lagged' feature refers to the lag of the 'units_sold' feature by a time step of 1.

The problem is, I want the 'Data_lagged' feature to equal 0 when the store_id feature changes in value.

For instance the entry 32 (indicated with **** in dataframe), needs to have a Data_lagged value of 0, since the store_id changes from 8091 to 8094.

Instead the value is the lagged unit_sales of the previous entry for store 8091 (as one expects).

This is a problem, since the lagged feature indicates a value of a different store. Store 8094 has a lagged feature value from store 8091.

I have tried many things including for loops etc. I am unable to obtain the results I need. Can anyone help me with this?

Author:HeinrichCode,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/67574162/creating-a-lagged-variable-subject-to-some-condition-in-a-pandas-dataframe
yy