0

I have a dataset with very limited data points.

 x<- c(4, 8, 13, 24)
 y<- c(40, 37, 28, 20)
 df<- data.frame(x,y)

Now I want to extrapolate this data, creating a dataset where the value of y will be given for every value (no decimals) of x between 1-100. x and y have a linear relationship.

Secondly, could this be done for multiple dataframes by using something like a loop? Thank you!

2
  • 1
    What is the relationship between x and y?
    – Adam Quek
    Commented May 31, 2022 at 9:27
  • Sorry, just wanted to add this info, it is a linear relationship! Commented May 31, 2022 at 9:30

1 Answer 1

5

This is a short snippet that does this:

linear_xy <- lm(y ~ x, data = df)
# df <- broom:::augment.lm(linear_xy, newdata = complete(df, x = 1:100)) # one way
df <- df %>%  # another way
  complete(x = 1:100) %>% 
  mutate(.fitted = predict(linear_xy, newdata = .))
ggplot(df, aes(x, y)) +
  geom_line(aes(y = .fitted)) +
  geom_point() +
  ggpubr::theme_pubr()

This requires that you have the packages {tidyverse}, {broom}, and {ggpubr} installed.

Second part

Assumming we want to do this with multiple data-frames, we have to restructure things a bit.

x <- c(4, 8, 13, 24)
y <- c(40, 37, 28, 20)
df <- tibble(x, y)

I don't have multiple data-frames (or tibbles), so I'll make this the primary one, and make up a function (a factory) that yields data-frames, that are a bit different from the above df.

df_factory <- . %>% 
  mutate(x_new = x + sample.int(100, size = n()),
         x = if_else(x_new >= 100, x, x_new),
         y_new = y + rnorm(n(), mean = median(y), sd = sd(y)),
         y = y_new,
         y_new = NULL,
         x_new = NULL)

Thus df_factory is a function of one-variable, and that must be a data-frame that has an x and y;

df1 <- df_factory(df)
df2 <- df_factory(df)
df3 <- df_factory(df)
all_dfs <- list(df1, df2, df3)
all_dfs <- bind_rows(all_dfs, .id = "df_id")

Here we ensure that the relation to the original data-frame is preserved in the all_dfs data-frame via the new variable df_id.

Next we want to:

  • Collapse the variables into their individual data-frame, and we put that in a list-column named data.
  • For each (see rowwise) we have to perform:
    • An "interpolating" linear model (not a piece-wise one so...)
    • Predict on each of these linear_xy (which are also stored in a list-column`).
  • Unnest it all back, so it can be fed into ggplot as one contiguous data-frame.
all_dfs %>%
  nest(data = c(x,y)) %>% 
  rowwise() %>% 
  mutate(linear_xy = list(lm(y ~ x, data = data)),
         augment = list(broom:::augment.lm(linear_xy, 
                                           newdata = complete(data, x = 1:100)))) %>%
  ungroup() %>% 
  select(-data, -linear_xy) %>% 
  unnest(augment) -> 
  all_dfs_predictions

Note: -> at the end shows what the pipe result is now assigned to.

The group informs ggplot to treat the rows as separate via their df_id. And for fun we add the color and fill to also depend on df_id. In fact I could have choosen something else to be the coloraesthetics dependent, like "original df" vs. "others" or if a certain threshold should distinguish them, etc.. But then the group aesthetic would still tell ggplot to separate the rows amongst this relation.

ggplot(all_dfs_predictions, aes(x, y, group = df_id, color = df_id, fill = df_id)) +
  geom_line(aes(y = .fitted)) +
  geom_point() +
  lims(x = c(1,100)) +
  ggpubr::theme_pubr()

The resulting ggplot2-plot

12
  • Unfortunately I get this Error: unexpected '=' in "df<- broom:::augment.lm(linear_df, newdata = complete(df, x =" Commented May 31, 2022 at 16:17
  • Laura, I did present two ways to achieve it. So you'll have to comment the other way out.
    – Mossa
    Commented May 31, 2022 at 16:34
  • 1
    I think something went wrong with the typing, anyway it works now! Thanks a lot! Commented Jun 1, 2022 at 9:06
  • 1
    No that would be unethical. I'll edit the answer, hmm, interesting question though..
    – Mossa
    Commented Jun 3, 2022 at 10:45
  • 1
    I just looked at it and it worked, thank you very much! Commented Jun 14, 2022 at 7:54

Not the answer you're looking for? Browse other questions tagged or ask your own question.