Skip to main content

Questions tagged [data.table]

The R data.table package is an extension of data.frame built for fast in-memory data analysis. Use the dt tag for the DataTables package with Shiny (DT).

0 votes
1 answer
41 views

sumifs with a criteria

Here is my data. iris.dt<-as.data.table(iris); And this is exactly what I want, and it works fine. iris.dt[,Sum.Petal.Width := c(sum(Petal.Width),rep(0, .N - 1)), by = Species]; Output to verify: ...
Mohit's user avatar
  • 389
0 votes
0 answers
24 views

mean of all or a subset of columns when subsetting by group using 'j' term in data.table [duplicate]

I have a large data.table where the columns represent many variables (v1... v99) and the ID for each set of variables. I would like to know the mean of each variable for each individual ID. If this ...
neuropsych's user avatar
0 votes
0 answers
32 views

In data.table in R, why does chaining and assignment require you to enter the variable twice in console before it appears? [duplicate]

I have an example below that results in nothing appearing in the last line when it is run in my RStudio console. However, running it again the second time shows what final_counts is. Why does this ...
user321627's user avatar
  • 2,504
0 votes
1 answer
22 views

How to Calculate Average Time from First Activity to a Milestone in a User Activity Log Using data.table in R?

I'm working with a dataset of user activity logs and need to calculate the average time it takes for first-time users to reach a specific milestone. Specifically, I want to find the average time it ...
user321627's user avatar
  • 2,504
0 votes
1 answer
23 views

How to create a variable based on unique counts within a time interval by multiple time points and grouping variable?

I would like to count the unique number of drugs, defined as the number of unique drug_code dispensations each individual (noted by idnr) have within 1 year prior the index_date + time_from_index. The ...
ccalle's user avatar
  • 53
0 votes
0 answers
55 views

fread() takes 60GB of RAM to load a 22GB CSV dataset [duplicate]

I am loading a CSV file into RStudio using fread() and despite the file being 22GB large, I can see my memory usage at 60 of my 64GB. Why is that? This becomes a problem right after as I need to join ...
Marti's user avatar
  • 101
2 votes
4 answers
111 views

data.table vs dplyr: apply function returning changing column names over groups

I want to apply a function (ratefunc()) to a grouped data frame which returns changing column names dependent on the result: library(data.table) library(dplyr) dt <- data.table:::data.table( ...
Cevior's user avatar
  • 119
0 votes
2 answers
59 views

calculating count and sum with a condition and on multiple category

This is my data. irisData<-as.data.table(iris) bins<-seq(4, length.out = 9, by = 0.5); aggtable <- data.table(bin1 = bins[-length(bins)], bin2 = bins[-1]) I would like to create a count and ...
Mohit's user avatar
  • 389
1 vote
2 answers
23 views

How to select specific columns across multiple dataframes in R and then bind them into one data.frame?

I am trying to select or subset multiple data frames with different number of columns. They all contain the same columns of interest, so I am trying to make them all contain the same columns so I can ...
Victor Shin's user avatar
0 votes
0 answers
36 views

Sum of previous years observations for unstructured data in R [closed]

I have very unstructured data for following variables: Host Home Industry Value Year value_lag A X I 1 2001 NA B X I 2 2001 NA C X I 3 2003 NA A X I ...
Neeraj's user avatar
  • 1,176
1 vote
2 answers
44 views

Recode relationship matrices based on new subgrouping

Problem: I have a survey dataset which includes intra-household relationships. I had to subdivide household into tax-unit, which means I need to redefine the relationship matrices based on the new tax-...
ravinglooper's user avatar
1 vote
3 answers
88 views

How to calculate conditional counts and sums?

I want to do something similar of COUNTIFS and SUMIFS in R using data.table package. Here is my data. library(data.table) treesData<-as.data.table(trees) bins<-seq(63, length.out = 10, by = 3); ...
Mohit's user avatar
  • 389
3 votes
2 answers
88 views

Improve processing time of applying a function over a vector and grouping by columns

I am trying to apply a function over data.table columns, and grouping by columns value. I am using the lapply fuction, but my script is quite slow. To give some context, I am working of probability ...
la_turz's user avatar
  • 33
1 vote
2 answers
39 views

Iterate over rows until a specific value is reached

I have a data.table dt <- data.table( Date = c("20240701", "20240801", "20240901", "20241001"), Plan = c(85,17,50, 34), OpenPlan = c(...
Sven's user avatar
  • 455
2 votes
5 answers
112 views

How to optimise an aggregation function with conditions?

I have an aggregation function that sums groups of data then creates a flag based on a set of conditions and assigns that to the group. The issue is that there is a large number of groups to aggregate ...
sebastian-c's user avatar
  • 15.3k

15 30 50 per page
1
2 3 4 5
914