i am attempting to create a lagged feature as part of my dataframe.
I have done this using the shift() function and my dataframe has the following updated format:
df1[515:525]
Out[20]:
store_id units_sold Data_lagged
144378 8091 45 18.0
145533 8091 34 45.0
146688 8091 20 34.0
147843 8091 27 20.0
148998 8091 38 27.0
32 8094 33 38.0 **** <- needs to be 0
1187 8094 36 33.0
2342 8094 37 36.0
3497 8094 33 37.0
4652 8094 37 33.0
The 'Data_lagged' feature refers to the lag of the 'units_sold' feature by a time step of 1.
The problem is, I want the 'Data_lagged' feature to equal 0 when the store_id feature changes in value.
For instance the entry 32 (indicated with **** in dataframe), needs to have a Data_lagged value of 0, since the store_id changes from 8091 to 8094.
Instead the value is the lagged unit_sales of the previous entry for store 8091 (as one expects).
This is a problem, since the lagged feature indicates a value of a different store. Store 8094 has a lagged feature value from store 8091.
I have tried many things including for loops etc. I am unable to obtain the results I need.
Can anyone help me with this?