Rolling regression with expanding window in R

Ask Time：2019-12-24T04:36:48 Author：user12586959

I would like to do a rolling linear regression, with expanding window, between two variables in a data frame, grouped by a third categorical column.

For example, in the toy data frame below, I would like to extract coefficient of lm(y~x) grouped by z using all rows until the row of interest. Thus for row 2, data set for regression will be rows 1:2, for row 3 will be rows 1:3, for row 4 will be just row 4 as it is the first row with categorical variable z= b

dframe<-data.frame(x=c(1:10),y=c(8:17), z=c("a","a","a","b","b","b","b","b","b","b"))

Using rollify function, I am able to get what I want except the expanding window. Below I have used a window size of 2

rol <- rollify(~coef(lm(.x~0+.y)),2) 
output<-dframe %>%  group_by(z) %>% mutate(tt=rol(x,y))

Specifically I do not know, how I can supply a variable window size to the rollify function. Is it possible?

Thinking broadly, what is an efficient way to do this operation? I need to do this on several 10000's of rows

Author:user12586959，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/59461067/rolling-regression-with-expanding-window-in-r

G. Grothendieck :

1) rollapplyr First split dframe and then run rollapplyr over each component of the split. Note that rollapplyr can take a vector of widths as the second argument.\n\nlibrary(zoo)\n\nroll <- function(data, n = nrow(data)) {\n rollapplyr(1:n, 1:n, function(ix) coef(lm(y ~ x+0, data, subset = ix))[[1]])\n}\n\nL <- split(dframe[-3], dframe[[3]])\ntransform(dframe, roll = unlist(lapply(L, roll)))\n\n\ngiving:\n\n x y z roll\na1 1 8 a 8.000000\na2 2 9 a 5.200000\na3 3 10 a 4.000000\nb1 4 11 b 2.750000\nb2 5 12 b 2.536585\nb3 6 13 b 2.363636\nb4 7 14 b 2.222222\nb5 8 15 b 2.105263\nb6 9 16 b 2.007380\nb7 10 17 b 1.924528\n\n\n1a) A variation would be to use ave instead of split.\n\nn <- nrow(dframe)\ntransform(dframe, roll = ave(1:n, z, FUN = function(ix) roll(dframe[ix, ]))\n\n\n1b) This alternative has been added some time after the question was originally answered.\n\nreg <- function(x) coef(lm(x[, 2] ~ x[, 1] + 0))\nn <- nrow(dframe)\nw <- ave(1:n, dframe$z, FUN = seq_along)\ntransform(dframe, \n roll = rollapplyr(zoo(cbind(x, y)), w, reg, by.column = FALSE, coredata = FALSE))\n\n\n2) dplyr/rollapplyr This is the same except we use dplyr to do the grouping. roll is from (1).\n\nlibrary(dplyr)\nlibrary(zoo)\n\ndframe %>%\n group_by(z) %>%\n mutate(roll = roll(data.frame(x, y))) %>%\n ungroup\n\n\ngiving:\n\n# A tibble: 10 x 4\n# Groups: z [2]\n x y z roll\n <int> <int> <fct> <dbl>\n 1 1 8 a 8 \n 2 2 9 a 5.20\n 3 3 10 a 4.00\n 4 4 11 b 2.75\n 5 5 12 b 2.54\n 6 6 13 b 2.36\n 7 7 14 b 2.22\n 8 8 15 b 2.11\n 9 9 16 b 2.01\n10 10 17 b 1.92\n\n\n3) Base R This could also be done without any packages like this where L is from (1). The result is similar to (1).\n\ntransform(dframe, roll = unlist(lapply(L, function(data, n = nrow(data)) {\n sapply(1:n, function(i) coef(lm(y ~ x + 0, data, subset = 1:i))[[1]])\n})))\n\n\n3a) roll in (1) can be replaced with roll2 in the following which uses no packages and does not even use lm giving us another base R solution. Again, L is from (1).\n\nroll2 <- function(data) with(data, cumsum(x * y) / cumsum(x * x))\ntransform(dframe, roll = unlist(lapply(L, roll2)))\n",

2019-12-24T00:04:39

Rolling regression with expanding window in R