Calculating moving average

Question

I'm trying to use R to calculate the moving average over a series of values in a matrix. There doesn't seem to be a built-in function in R that will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?

Gregor Thomas · Accepted Answer · 2019-03-28 16:16:12Z

257

Or you can simply calculate it using filter, here's the function I use:

ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}

If you use dplyr, be careful to specify stats::filter in the function above.

edited Mar 28, 2019 at 16:16

Gregor Thomas

144k22 gold badges179 silver badges305 bronze badges

answered Feb 1, 2011 at 12:06

Matti Pastell

9,2333 gold badges39 silver badges44 bronze badges

57

I should point out that "sides=2" may be an important option in many people's use cases that they don't want to overlook. If you want only trailing information in your moving average, you should use sides=1.
– evanrsparks
Commented Apr 2, 2012 at 20:58
37

Some years later but dplyr now has a filter function, if you have this package loaded use stats::filter
– blmoore
Commented Apr 8, 2015 at 14:00
sides = 2 is equivalent to align="center" for the zoo::rollmean or RcppRoll::roll_mean. sides = 1 is equivalent to "right" alignment. I don't see a way to do "left" alignment or calculate with "partial" data (2 or more values)?
– Matt L.
Commented Sep 18, 2017 at 20:32
3

stats::filter gives a time series object. Pass the result to as.vector to get a vector.
– qwr
Commented Jul 19, 2020 at 8:34
This may be useful to read: stackoverflow.com/a/61777773/3348414
– SqueakyBeak
Commented Jul 11, 2022 at 14:20

Add a comment |

Frank · Accepted Answer · 2017-08-11 20:04:21Z

171

Rolling Means/Maximums/Medians in the zoo package (rollmean)
MovingAverages in TTR
ma in forecast

edited Aug 11, 2017 at 20:04

Frank

66.7k8 gold badges101 silver badges184 bronze badges

answered Apr 13, 2009 at 13:06

f3lix

29.8k11 gold badges67 silver badges86 bronze badges

3

What is the moving average in R not containing future values of given timestamp? I checked forecast::ma and it contains all neighbourhood, not right.
– hhh
Commented Sep 7, 2018 at 20:52
2

Try the stats::filter function instead. There you can set sides = 1for only past values. E.g. stats::filter(x, rep(1,5), sides = 1)/5 for the mean over 5 values.
– panuffel
Commented May 7, 2021 at 12:03
1

I would add frollmean function in data.table.
– Jordi Aceiton
Commented Oct 11, 2022 at 10:57

Add a comment |

pipefish · Accepted Answer · 2019-02-26 11:34:03Z

38

Using cumsum should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbers

cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n

As pointed out in the comments by @mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from @Ricardo Cruz:

cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn

This still has the issue that if all the values in the window are NAs then there will be a division by zero error.

edited Feb 26, 2019 at 11:34

answered Aug 12, 2015 at 20:26

pipefish

9081 gold badge9 silver badges13 bronze badges

10

One downside to this solution is that it can't handle missings: cumsum(c(1:3,NA,1:3))
– Jthorpe
Commented Feb 24, 2016 at 19:15
@Ricardo Cruz: it might be better to remove the NAs and adjust the vector length accordingly. Think of a vector with a lot of NAs -- zeros will pull the average toward zero, while removing the NAs will leave the average as it is. It all depends on your data and the question you want to answer, of course. :)
– mzuther is on OpenAI strike
Commented Oct 2, 2018 at 14:24
1

@mzuther, I updated the answer following your comments. Thanks for the input. I think the correct way of dealing with missing data is not extending the window (by removing the NA values), but by averaging each window by the correct denominator.
– pipefish
Commented Oct 5, 2018 at 17:33
1

rn <- cn[(n+1):length(cx)] - cx[1:(length(cx) - n)] should actually be rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
– adrianmcmenamin
Commented Feb 21, 2019 at 15:55
You also have to handle Inf values, no?
– jangorecki
Commented Aug 31, 2022 at 13:43

| Show 1 more comment

jangorecki · Accepted Answer · 2023-01-31 16:49:52Z

In data.table 1.12.0 new frollmean function has been added to compute fast and exact rolling mean carefully handling NA, NaN and +Inf, -Inf values.

As there is no reproducible example in the question there is not much more to address here.

You can find more info about ?frollmean in manual, also available online at ?frollmean.

Examples from manual below:

library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))

# rollmean of single vector and single window
frollmean(d[, V1], 3)

# multiple columns at once
frollmean(d, 3)

# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))

# multiple columns and multiple windows at once
frollmean(d, c(3, 4))

## three above are embarrassingly parallel using openmp

eddi · Accepted Answer · 2013-08-21 17:11:53Z

13

The caTools package has very fast rolling mean/min/max/sd and few other functions. I've only worked with runmean and runsd and they are the fastest of any of the other packages mentioned to date.

answered Aug 21, 2013 at 17:11

eddi

49.4k6 gold badges107 silver badges156 bronze badges

1

This is awesome! It is the only function that does this in a nice, simple way. And it's 2018 now...
– Felipe Gerard
Commented Apr 17, 2018 at 22:30

Add a comment |

I Like to Code · Accepted Answer · 2020-04-23 03:59:27Z

Here is example code showing how to compute a centered moving average and a trailing moving average using the rollmean function from the zoo package.

library(tidyverse)
library(zoo)

some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
    mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
    mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#>      day   cma   tma
#>    <int> <dbl> <dbl>
#>  1     1    NA    NA
#>  2     2     2    NA
#>  3     3     3     2
#>  4     4     4     3
#>  5     5     5     4
#>  6     6     6     5
#>  7     7     7     6
#>  8     8     8     7
#>  9     9     9     8
#> 10    10    NA     9

You can use one mutate call for multiple new columns by separating each new column with a comma. — H5470, Commented Oct 23, 2020 at 19:32

Jaap · Accepted Answer · 2018-08-16 05:22:09Z

10

You could use RcppRoll for very quick moving averages written in C++. Just call the roll_mean function. Docs can be found here.

Otherwise, this (slower) for loop should do the trick:

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n):i])
  }
  res
}

edited Aug 16, 2018 at 5:22

Jaap

82.7k35 gold badges186 silver badges199 bronze badges

answered Jan 16, 2015 at 14:49

cantdutchthis

33.7k17 gold badges75 silver badges116 bronze badges

3

Can you please explain me in details, how does this algorithm work? Because I cannot understand the idea
– Daniil Yefimov
Commented Mar 13, 2017 at 16:01
First he initializes a vector of the same length with res = arr. Then there is a loop that iterates starting at n or, the 15th element, to the end of the array. that means the very first subset he takes the mean of is arr[1:15] which fills spot res[15]. Now, I prefer settingres = rep(NA, length(arr)) instead of res = arr so each element of res[1:14] equals NA rather than a number, where we couldn't take a full average of 15 elements.
– Evan Friedland
Commented Sep 17, 2018 at 0:50
I think it should be arr[(i-n+1):i]
– gaspar
Commented Jul 12, 2021 at 10:11

Add a comment |

Community · Accepted Answer · 2017-05-23 11:47:19Z

In fact RcppRoll is very good.

The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n+1):i])
  }
  res
}

Another way, which handles missings, is given here.

A third way, improving cantdutchthis code to calculate partial averages or not, follows:

  ma <- function(x, n=2,parcial=TRUE){
  res = x #set the first values

  if (parcial==TRUE){
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res

  }else{
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
  }
}

Feiming Chen · Accepted Answer · 2020-09-17 20:51:16Z

6

You may calculate the moving average of a vector x with a window width of k by:

apply(embed(x, k), 1, mean)

answered Sep 17, 2020 at 20:51

Feiming Chen

791 silver badge3 bronze badges

An extension of this to data.frames is: apply(df,rc,FUN=function(x) apply(embed(x, k),1,mean)) . rc can be one or two, for rows or columns, respectively.
– understorey
Commented May 28, 2021 at 17:53
this is not moving average
– Qbik
Commented Jul 8, 2022 at 13:12

Add a comment |

Cristóbal Alcázar · Accepted Answer · 2017-06-29 14:29:46Z

In order to complement the answer of cantdutchthis and Rodrigo Remedio;

moving_fun <- function(x, w, FUN, ...) {
  # x: a double vector
  # w: the length of the window, i.e., the section of the vector selected to apply FUN
  # FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
  # Given a double type vector apply a FUN over a moving window from left to the right, 
  #    when a window boundary is not a legal section, i.e. lower_bound and i (upper bound) 
  #    are not contained in the length of the vector, return a NA_real_
  if (w < 1) {
    stop("The length of the window 'w' must be greater than 0")
  }
  output <- x
  for (i in 1:length(x)) {
     # plus 1 because the index is inclusive with the upper_bound 'i'
    lower_bound <- i - w + 1
    if (lower_bound < 1) {
      output[i] <- NA_real_
    } else {
      output[i] <- FUN(x[lower_bound:i, ...])
    }
  }
  output
}

# example
v <- seq(1:10)

# compute a MA(2)
moving_fun(v, 2, mean)

# compute moving sum of two periods
moving_fun(v, 2, sum)

Davis Vaughan · Accepted Answer · 2020-03-24 18:43:12Z

The slider package can be used for this. It has an interface that has been specifically designed to feel similar to purrr. It accepts any arbitrary function, and can return any type of output. Data frames are even iterated over row wise. The pkgdown site is here.

library(slider)

x <- 1:3

# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5


df <- data.frame(x = x, y = x)

# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#>   x y
#> 1 1 1
#> 
#> [[2]]
#>   x y
#> 1 1 1
#> 2 2 2
#> 
#> [[3]]
#>   x y
#> 1 2 2
#> 2 3 3

The overhead of both slider and data.table's frollapply() should be pretty low (much faster than zoo). frollapply() looks to be a little faster for this simple example here, but note that it only takes numeric input, and the output must be a scalar numeric value. slider functions are completely generic, and you can return any data type.

library(slider)
library(zoo)
library(data.table)

x <- 1:50000 + 0L

bench::mark(
  slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
  zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
  datatable = frollapply(x, n = 6, FUN = function(x) 1L),
  iterations = 200
)
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 slider      19.82ms   26.4ms     38.4    829.8KB     19.0
#> 2 zoo        177.92ms  211.1ms      4.71    17.9MB     24.8
#> 3 datatable    7.78ms   10.9ms     87.9    807.1KB     38.7

MS Berends · Accepted Answer · 2020-10-12 14:34:37Z

EDIT: took great joy in adding the side parameter, for a moving average (or sum, or ...) of e.g. the past 7 days of a Date vector.

For people just wanting to calculate this themselves, it's nothing more than:

# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))

for (i in seq_len(length(x))) {
  ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
  ind <- ind[ind %in% seq_len(length(x))]
  y[i] <- mean(x[ind])
}

y

But it gets fun to make it independent of mean(), so you can calculate any 'moving' function!

# our working horse:
moving_fn <- function(x, w, fun, ...) {
  # x = vector with numeric data
  # w = window length
  # fun = function to apply
  # side = side to take, (c)entre, (l)eft or (r)ight
  # ... = parameters passed on to 'fun'
  y <- numeric(length(x))
  for (i in seq_len(length(x))) {
    if (side %in% c("c", "centre", "center")) {
      ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
    } else if (side %in% c("l", "left")) {
      ind <- c((i - floor(w) + 1):i)
    } else if (side %in% c("r", "right")) {
      ind <- c(i:(i + floor(w) - 1))
    } else {
      stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
    }
    ind <- ind[ind %in% seq_len(length(x))]
    y[i] <- fun(x[ind], ...)
  }
  y
}

# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}

moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}

moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}

moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}

moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}

moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}

Garima gulati · Accepted Answer · 2018-09-11 04:34:26Z

1

Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.

reqd_ma <- rollapply(x, FUN = mean, width = n)

where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.

answered Sep 11, 2018 at 4:34

Garima gulati

114 bronze badges

2

It is not slow;. Comparing it to base R, it is much faster. set.seed(123); x <- rnorm(1000); system.time(apply(embed(x, 5), 1, mean)); library(zoo); system.time(rollapply(x, 5, mean)) On my machine it is so fast that it returns a time of 0 seconds.
– G. Grothendieck
Commented Sep 12, 2018 at 15:55

Add a comment |

GoGonzo · Accepted Answer · 2020-01-16 18:33:28Z

One can use runner package for moving functions. In this case mean_run function. Problem with cummean is that it doesn't handle NA values, but mean_run does. runner package also supports irregular time series and windows can depend on date:

library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))

mean_run(x1)
#>  [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#>  [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809

mean_run(x2, na_rm = TRUE)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202

mean_run(x2, na_rm = FALSE )
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7]          NA          NA          NA          NA          NA          NA
#> [13]          NA          NA          NA

mean_run(x2, na_rm = TRUE, k = 4)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.10546063 -0.16299272
#>  [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684  0.01103493
#> [13]  0.09609256  0.09738460  0.04740283

mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696  0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571  0.009742884  0.009742884  0.012326968
#> [13]  0.182442234  0.125737145  0.059094786

One can also specify other options like lag, and roll only at specific indexes. More in package and function documentation.

qwr · Accepted Answer · 2020-07-19 23:55:36Z

Here is a simple function with filter demonstrating one way to take care of beginning and ending NAs with padding, and computing a weighted average (supported by filter) using custom weights:

wma <- function(x) { 
  wts <- c(seq(0.5, 4, 0.5), seq(3.5, 0.5, -0.5))
  nside <- (length(wts)-1)/2
  # pad x with begin and end values for filter to avoid NAs
  xp <- c(rep(first(x), nside), x, rep(last(x), nside)) 
  z <- stats::filter(xp, wts/sum(wts), sides = 2) %>% as.vector 
  z[(nside+1):(nside+length(x))]
}

Rui Barradas · Accepted Answer · 2020-07-16 12:27:13Z

0

vector_avg <- function(x){
  sum_x = 0
  for(i in 1:length(x)){
    if(!is.na(x[i]))
      sum_x = sum_x + x[i]
  }
  return(sum_x/length(x))
}

edited Jul 16, 2020 at 12:27

Rui Barradas

74.7k8 gold badges39 silver badges70 bronze badges

answered Jul 16, 2020 at 12:11

Mohamed Galia

1

3

Please add a description for further detail.
– Farbod Ahmadian
Commented Jul 16, 2020 at 12:52
2

Please relate your answer to the question and include some output which shows the question has been answered. See How to Answer for guidance on making a good answer.
– Peter
Commented Jul 16, 2020 at 15:35

Add a comment |

Rob Moses · Accepted Answer · 2020-12-07 15:17:15Z

I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:

v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)

Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.

In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.

Bruce Bao · Accepted Answer · 2023-07-14 11:01:01Z

0

Another useful function if you want the two ends of series not to be NA but to be recursively calculated moving averages:

smoothing = function(x, k=1) {
  sapply(seq_along(x), function(i) {
    i.min = max(i-k, 1)
    i.max = min(i+k, length(x))
    mean(x[i.min:i.max], na.rm=TRUE)
  })
}

Example:

x = 1:10/2

[1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

smoothing(x, 2)

[1] 1.00 1.25 1.50 2.00 2.50 3.00 3.50 4.00 4.25 4.50

answered Jul 14, 2023 at 11:01

Bruce Bao

11 bronze badge

Add a comment |

Collectives™ on Stack Overflow

Calculating moving average

18 Answers 18

Not the answer you're looking for? Browse other questions tagged
r
moving-average
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

18 Answers 18

Not the answer you're looking for? Browse other questions tagged rmoving-average or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
r
moving-average
or ask your own question.