logo
down
shadow

R: How to calculate lag for multiple columns by group for data table


R: How to calculate lag for multiple columns by group for data table

By : user2949004
Date : November 15 2020, 06:54 AM
I wish did fix the issue. I would like to calculate the diff of variables in a data table, grouped by id. Here is some sample data. The data is recorded at a sample rate of 1 Hz. I would like to estimate the first and second derivatives (speed, acceleration) , You can try
code :
 setnames(dt[, lapply(.SD, function(x) c(NA,diff(x))), by=id], 
                2:3, c('dx', 'dy'))[]
 #    id dx dy
  #1:  1 NA NA
  #2:  1  1  2
  #3:  1  1  1
  #4:  2 NA NA
  #5:  2  4 -6
  #6:  2  1  1
 library(dplyr)
 df %>% 
     group_by(id) %>%
     mutate_each(funs(c(NA,diff(.))))%>%
     rename(dx=x, dy=y)
dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))), by=id]
dt[,c('dx2', 'dy2'):= lapply(.SD, function(x) c(NA, diff(x))),
                                            by=id, .SDcols=4:5]
 dt
 #   x y id dx dy dx2 dy2
 #1: 1 2  1 NA NA  NA  NA
 #2: 2 4  1  1  2  NA  NA
 #3: 3 5  1  1  1   0  -1
 #4: 1 8  2 NA NA  NA  NA
 #5: 5 2  2  4 -6  NA  NA
 #6: 6 3  2  1  1  -3   7
dt[, paste0("d", c("x", "y")) := .SD - shift(.SD), by = id
  ][, paste0("d", c("x2", "y2")) := .SD - shift(.SD) , by =  id, .SDcols = 4:5 ]


Share : facebook icon twitter icon
How to group data.table by multiple columns?

How to group data.table by multiple columns?


By : Reihane
Date : March 29 2020, 07:55 AM
I hope this helps . Use by=list(adShown,url) instead of by=c("adShown","url")
Example:
code :
set.seed(007) 
DF <- data.frame(X=1:20, Y=sample(c(0,1), 20, TRUE), Z=sample(0:5, 20, TRUE))

library(data.table)
DT <- data.table(DF)
DT[, Mean:=mean(X), by=list(Y, Z)]


     X Y Z      Mean
 1:  1 1 3  1.000000
 2:  2 0 1  9.333333
 3:  3 0 5  7.400000
 4:  4 0 5  7.400000
 5:  5 0 5  7.400000
 6:  6 1 0  6.000000
 7:  7 0 3  7.000000
 8:  8 1 2 12.500000
 9:  9 0 5  7.400000
10: 10 0 2 15.000000
11: 11 0 4 14.500000
12: 12 0 1  9.333333
13: 13 1 1 13.000000
14: 14 0 1  9.333333
15: 15 0 2 15.000000
16: 16 0 5  7.400000
17: 17 1 2 12.500000
18: 18 0 4 14.500000
19: 19 1 5 19.000000
20: 20 0 2 15.000000
How to use data.table to efficiently calculate allele frequencies (proportions) by group across multiple columns (loci)

How to use data.table to efficiently calculate allele frequencies (proportions) by group across multiple columns (loci)


By : Hendy Nugraha
Date : March 29 2020, 07:55 AM
Any of those help It's probably wise to transform your data.table into long format first. This will make it easier to use for further calculations (or making visualisations with ggplot2 for example). With the melt function of data.table (which works the same as the melt function of the reshape2 package) you can transform from wide to long format:
code :
DT2 <- melt(DT, id = "Group", variable.name = "loci")
DT2 <- DT2[, .N, by = .(Group, loci, value)][, prop := N/sum(N), by = .(Group, loci)]
> DT2
    Group loci value N      prop
 1:    G1 Loc1     G 3 1.0000000
 2:    G2 Loc1    NA 1 0.2500000
 3:    G2 Loc1     G 1 0.2500000
 4:    G2 Loc1     T 2 0.5000000
 5:    G3 Loc1     T 2 0.6666667
 6:    G3 Loc1    NA 1 0.3333333
 7:    G1 Loc2    NA 1 0.3333333
 8:    G1 Loc2     A 1 0.3333333
 9:    G1 Loc2     C 1 0.3333333
10:    G2 Loc2    NA 1 0.2500000
11:    G2 Loc2     C 2 0.5000000
12:    G2 Loc2     A 1 0.2500000
13:    G3 Loc2     A 2 0.6666667
14:    G3 Loc2     C 1 0.3333333
15:    G1 Loc3     C 1 0.3333333
16:    G1 Loc3     G 2 0.6666667
17:    G2 Loc3    NA 2 0.5000000
18:    G2 Loc3     G 2 0.5000000
19:    G3 Loc3     G 3 1.0000000
DT3 <- dcast(DT2, Group + loci ~ value, value.var = c("N", "prop"), fill = 0)
> DT3
   Group loci N_A N_C N_G N_T N_NA    prop_A    prop_C    prop_G    prop_T   prop_NA
1:    G1 Loc1   0   0   3   0    0 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000
2:    G1 Loc2   1   1   0   0    1 0.3333333 0.3333333 0.0000000 0.0000000 0.3333333
3:    G1 Loc3   0   1   2   0    0 0.0000000 0.3333333 0.6666667 0.0000000 0.0000000
4:    G2 Loc1   0   0   1   2    1 0.0000000 0.0000000 0.2500000 0.5000000 0.2500000
5:    G2 Loc2   1   2   0   0    1 0.2500000 0.5000000 0.0000000 0.0000000 0.2500000
6:    G2 Loc3   0   0   2   0    2 0.0000000 0.0000000 0.5000000 0.0000000 0.5000000
7:    G3 Loc1   0   0   0   2    1 0.0000000 0.0000000 0.0000000 0.6666667 0.3333333
8:    G3 Loc2   2   1   0   0    0 0.6666667 0.3333333 0.0000000 0.0000000 0.0000000
9:    G3 Loc3   0   0   3   0    0 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000
DT2 <- dcast(melt(DT, id="Group"), Group + variable ~ value)
> DT2
   Group variable A C G T NA
1:    G1     Loc1 0 0 3 0  0
2:    G1     Loc2 1 1 0 0  1
3:    G1     Loc3 0 1 2 0  0
4:    G2     Loc1 0 0 1 2  1
5:    G2     Loc2 1 2 0 0  1
6:    G2     Loc3 0 0 2 0  2
7:    G3     Loc1 0 0 0 2  1
8:    G3     Loc2 2 1 0 0  0
9:    G3     Loc3 0 0 3 0  0
DT <- structure(list(Loc1 = c("G", "G", "G", NA, "G", "T", "T", "T", "T", NA), 
                     Loc2 = c(NA, "A", "C", NA, "C", "A", "C", "A", "C", "A"), 
                     Loc3 = c("C", "G", "G", NA, NA, "G", "G", "G", "G", "G"), 
                     Group = c("G1", "G1", "G1", "G2", "G2", "G2", "G2", "G3", "G3", "G3")), 
                .Names = c("Loc1", "Loc2", "Loc3", "Group"), row.names = c(NA, -10L), class = c("data.table", "data.frame"))
r - data.table - Group data.table result by multiple columns with rounding

r - data.table - Group data.table result by multiple columns with rounding


By : Aaron Solomon
Date : March 29 2020, 07:55 AM
I hope this helps . When using the summary syntax in data.table, i.e, not using :=, you can include columns in your result by adding the column in the list at the position j:
code :
mtcars2[,.(displace = round(disp / sum(disp), digits = 3), disp), by = cyl]

#    cyl displace  disp
# 1:   6    0.125 160.0
# 2:   6    0.125 160.0
# 3:   6    0.201 258.0
# 4:   6    0.175 225.0
# 5:   6    0.131 167.6
# 6:   6    0.131 167.6
# 7:   6    0.113 145.0
# ...
Calculate a group index across multiple columns of a data frame in R

Calculate a group index across multiple columns of a data frame in R


By : Thandar Phru
Date : March 29 2020, 07:55 AM
To fix this issue What is the most efficient way to calculate a group index (group identifier) across multiple columns in a data frame or data.table in R? , How about this using data.table,
code :
library(data.table)
setDT(df)[,group :=.GRP,by = .(a,b)]
> df
    a b group
 1: 1 a     1
 2: 2 b     2
 3: 1 c     3
 4: 2 a     4
 5: 1 b     5
 6: 2 c     6
 7: 1 a     1
 8: 2 b     2
 9: 1 c     3
10: 2 a     4
11: 1 b     5
12: 2 c     6
Calculate multiple columns by operating on several other columns in the same data.table

Calculate multiple columns by operating on several other columns in the same data.table


By : namcmc
Date : March 29 2020, 07:55 AM
To fix the issue you can do Probably a matter of taste, but you could use Map and build some lists to feed it.
code :
DT[, c("z1", "z2") := Map("*", list(x1, x2), list(y1, y2))]
DT[, c("z1", "z2") := Map("*", mget(ls(pattern="x")), mget(ls(pattern="y")))]
DT
   x1 y1 x2 y2 z1  z2
1:  1  6 11 16  6 176
2:  2  7 12 17 14 204
3:  3  8 13 18 24 234
4:  4  9 14 19 36 266
5:  5 10 15 20 50 300
Related Posts Related Posts :
  • Error Handling with Lapply
  • data.table - split multiple columns
  • How to compute the overall mean for several files in R?
  • R: Graph Plotting: Subscripts in the legend like LaTeX
  • Restructuring data in R
  • Distance of pointsfrom cluster centers after K means clustering
  • R incorrect value of date function
  • Package "Imports" not loading in R development package
  • r - run a user defined function several times by taking column elements as parameters
  • Create input$selection to subset data AND radiobuttons to choose plot type in Shiny
  • Generate crosstabulations from dataframe of categorical variables in survey
  • Restructure output of R summary function
  • New behavior in data.table? .N / something with `by` (calculate proportion)
  • search certain number vector in R
  • R version doesn't support quartz graphic device - RStudio won't plot
  • Referencing a function parameter in R
  • How to synchronize signals using a cross-correlation and FFT in R?
  • Plotting coefficients and corresponding confidence intervals
  • passing expressions to curve() within a function
  • More effective merging of matched column with duplicates in data.table
  • Easy way to export multiple data.frame to multiple Excel worksheets
  • R Foreach Iterator - Walkforward
  • Table format and output in R
  • Restructuring data and duplicating rows in R
  • use ggplot2 to plot two lines with ribbons
  • how to plot a graph on lattice with two different colors
  • How can I keep a date formatted in R using sqldf?
  • Generating simulation data based on a specified distribution
  • Joining list of data frames in R
  • Subset data in R
  • R: How to avoid 2 'for' loops in R in this function
  • + signs appearing in console in R
  • how to create a dataframe form a lists within a list in R
  • Best way to combine and keep columns
  • Using identify and attach in a function
  • Apply function to each submatrix
  • How to assign regular strings for quarterly and monthly observation labels to the row names of a data frame?
  • Adjust hexbin legend breaks
  • Different lowess curves in plot and qplot in R
  • Extract words only with R
  • switch case: several equivalent cases expressions in r
  • R data.table to calculate a formula using a column as a variable across levels of a factor
  • how to create a line plot frame in ggplot2
  • Subset by row number within magrittr chain
  • GGPLOT - two curves in one plot in B_W mode
  • How can i build a for function for matrix?
  • How to Word-like-merge columns or rows of a data frame for displaying purposes in R?
  • How to keep all rows of a table on the same page in RMarkdown when rendering a PDF file?
  • Add transparency to GoogleMap plot (loa package)
  • replace a column in a dataframe given a corresponding vector in r
  • subset data and plot this subsetted data with Shiny
  • How can i count the numbers in every subset?
  • Request URL failed/timeout in R
  • IF then do end equivalent in r... EDIT: in dplyr
  • how to check if each cells (list) of a column of a dataframe are unique in R?
  • Column widths not aligned with table data in pander tables sent from R with sendmailr
  • Getting the value of a Variable which has its name based upon another variable (in R)
  • Web Page Click Through Heat Map using R
  • Add a label to map at each leg start
  • R Caret Random Forest view miss-classified
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org