Questions tagged [data.table]
The R data.table package is an extension of data.frame built for fast in-memory data analysis. Use the dt tag for the DataTables package with Shiny (DT).
data.table
13,708
questions
0
votes
1
answer
41
views
sumifs with a criteria
Here is my data.
iris.dt<-as.data.table(iris);
And this is exactly what I want, and it works fine.
iris.dt[,Sum.Petal.Width := c(sum(Petal.Width),rep(0, .N - 1)), by = Species];
Output to verify:
...
0
votes
0
answers
24
views
mean of all or a subset of columns when subsetting by group using 'j' term in data.table [duplicate]
I have a large data.table where the columns represent many variables (v1... v99) and the ID for each set of variables. I would like to know the mean of each variable for each individual ID.
If this ...
0
votes
0
answers
32
views
In data.table in R, why does chaining and assignment require you to enter the variable twice in console before it appears? [duplicate]
I have an example below that results in nothing appearing in the last line when it is run in my RStudio console. However, running it again the second time shows what final_counts is. Why does this ...
0
votes
1
answer
22
views
How to Calculate Average Time from First Activity to a Milestone in a User Activity Log Using data.table in R?
I'm working with a dataset of user activity logs and need to calculate the average time it takes for first-time users to reach a specific milestone. Specifically, I want to find the average time it ...
0
votes
1
answer
23
views
How to create a variable based on unique counts within a time interval by multiple time points and grouping variable?
I would like to count the unique number of drugs, defined as the number of unique drug_code dispensations each individual (noted by idnr) have within 1 year prior the index_date + time_from_index. The ...
0
votes
0
answers
55
views
fread() takes 60GB of RAM to load a 22GB CSV dataset [duplicate]
I am loading a CSV file into RStudio using fread() and despite the file being 22GB large, I can see my memory usage at 60 of my 64GB. Why is that? This becomes a problem right after as I need to join ...
2
votes
4
answers
111
views
data.table vs dplyr: apply function returning changing column names over groups
I want to apply a function (ratefunc()) to a grouped data frame which returns changing column names dependent on the result:
library(data.table)
library(dplyr)
dt <- data.table:::data.table(
...
0
votes
2
answers
59
views
calculating count and sum with a condition and on multiple category
This is my data.
irisData<-as.data.table(iris)
bins<-seq(4, length.out = 9, by = 0.5);
aggtable <- data.table(bin1 = bins[-length(bins)], bin2 = bins[-1])
I would like to create a count and ...
1
vote
2
answers
23
views
How to select specific columns across multiple dataframes in R and then bind them into one data.frame?
I am trying to select or subset multiple data frames with different number of columns. They all contain the same columns of interest, so I am trying to make them all contain the same columns so I can ...
0
votes
0
answers
36
views
Sum of previous years observations for unstructured data in R [closed]
I have very unstructured data for following variables:
Host Home Industry Value Year value_lag
A X I 1 2001 NA
B X I 2 2001 NA
C X I 3 2003 NA
A X I ...
1
vote
2
answers
44
views
Recode relationship matrices based on new subgrouping
Problem:
I have a survey dataset which includes intra-household relationships. I had to subdivide household into tax-unit, which means I need to redefine the relationship matrices based on the new tax-...
1
vote
3
answers
88
views
How to calculate conditional counts and sums?
I want to do something similar of COUNTIFS and SUMIFS in R using data.table package.
Here is my data.
library(data.table)
treesData<-as.data.table(trees)
bins<-seq(63, length.out = 10, by = 3);
...
3
votes
2
answers
88
views
Improve processing time of applying a function over a vector and grouping by columns
I am trying to apply a function over data.table columns, and grouping by columns value.
I am using the lapply fuction, but my script is quite slow.
To give some context, I am working of probability ...
1
vote
2
answers
39
views
Iterate over rows until a specific value is reached
I have a data.table
dt <- data.table(
Date = c("20240701", "20240801", "20240901", "20241001"),
Plan = c(85,17,50, 34),
OpenPlan = c(...
2
votes
5
answers
112
views
How to optimise an aggregation function with conditions?
I have an aggregation function that sums groups of data then creates a flag based on a set of conditions and assigns that to the group. The issue is that there is a large number of groups to aggregate ...