C RUBY-ON-RAILS MYSQL ASP.NET DEVELOPMENT RUBY .NET LINUX SQL-SERVER REGEX WINDOWS ALGORITHM ECLIPSE VISUAL-STUDIO STRING SVN PERFORMANCE APACHE-FLEX UNIT-TESTING SECURITY LINQ UNIX MATH EMAIL OOP LANGUAGE-AGNOSTIC VB6 MSBUILD

# R: How to calculate lag for multiple columns by group for data table

By : user2949004
Date : November 15 2020, 06:54 AM
I wish did fix the issue. I would like to calculate the diff of variables in a data table, grouped by id. Here is some sample data. The data is recorded at a sample rate of 1 Hz. I would like to estimate the first and second derivatives (speed, acceleration) , You can try
code :
`````` setnames(dt[, lapply(.SD, function(x) c(NA,diff(x))), by=id],
2:3, c('dx', 'dy'))[]
#    id dx dy
#1:  1 NA NA
#2:  1  1  2
#3:  1  1  1
#4:  2 NA NA
#5:  2  4 -6
#6:  2  1  1
``````
`````` library(dplyr)
df %>%
group_by(id) %>%
mutate_each(funs(c(NA,diff(.))))%>%
rename(dx=x, dy=y)
``````
``````dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))), by=id]
dt[,c('dx2', 'dy2'):= lapply(.SD, function(x) c(NA, diff(x))),
by=id, .SDcols=4:5]
dt
#   x y id dx dy dx2 dy2
#1: 1 2  1 NA NA  NA  NA
#2: 2 4  1  1  2  NA  NA
#3: 3 5  1  1  1   0  -1
#4: 1 8  2 NA NA  NA  NA
#5: 5 2  2  4 -6  NA  NA
#6: 6 3  2  1  1  -3   7
``````
``````dt[, paste0("d", c("x", "y")) := .SD - shift(.SD), by = id
][, paste0("d", c("x2", "y2")) := .SD - shift(.SD) , by =  id, .SDcols = 4:5 ]
``````

Share :

## How to group data.table by multiple columns?

By : Reihane
Date : March 29 2020, 07:55 AM
Example:
code :
``````set.seed(007)
DF <- data.frame(X=1:20, Y=sample(c(0,1), 20, TRUE), Z=sample(0:5, 20, TRUE))

library(data.table)
DT <- data.table(DF)
DT[, Mean:=mean(X), by=list(Y, Z)]

X Y Z      Mean
1:  1 1 3  1.000000
2:  2 0 1  9.333333
3:  3 0 5  7.400000
4:  4 0 5  7.400000
5:  5 0 5  7.400000
6:  6 1 0  6.000000
7:  7 0 3  7.000000
8:  8 1 2 12.500000
9:  9 0 5  7.400000
10: 10 0 2 15.000000
11: 11 0 4 14.500000
12: 12 0 1  9.333333
13: 13 1 1 13.000000
14: 14 0 1  9.333333
15: 15 0 2 15.000000
16: 16 0 5  7.400000
17: 17 1 2 12.500000
18: 18 0 4 14.500000
19: 19 1 5 19.000000
20: 20 0 2 15.000000
``````

## How to use data.table to efficiently calculate allele frequencies (proportions) by group across multiple columns (loci)

By : Hendy Nugraha
Date : March 29 2020, 07:55 AM
Any of those help It's probably wise to transform your data.table into long format first. This will make it easier to use for further calculations (or making visualisations with ggplot2 for example). With the melt function of data.table (which works the same as the melt function of the reshape2 package) you can transform from wide to long format:
code :
``````DT2 <- melt(DT, id = "Group", variable.name = "loci")
``````
``````DT2 <- DT2[, .N, by = .(Group, loci, value)][, prop := N/sum(N), by = .(Group, loci)]
``````
``````> DT2
Group loci value N      prop
1:    G1 Loc1     G 3 1.0000000
2:    G2 Loc1    NA 1 0.2500000
3:    G2 Loc1     G 1 0.2500000
4:    G2 Loc1     T 2 0.5000000
5:    G3 Loc1     T 2 0.6666667
6:    G3 Loc1    NA 1 0.3333333
7:    G1 Loc2    NA 1 0.3333333
8:    G1 Loc2     A 1 0.3333333
9:    G1 Loc2     C 1 0.3333333
10:    G2 Loc2    NA 1 0.2500000
11:    G2 Loc2     C 2 0.5000000
12:    G2 Loc2     A 1 0.2500000
13:    G3 Loc2     A 2 0.6666667
14:    G3 Loc2     C 1 0.3333333
15:    G1 Loc3     C 1 0.3333333
16:    G1 Loc3     G 2 0.6666667
17:    G2 Loc3    NA 2 0.5000000
18:    G2 Loc3     G 2 0.5000000
19:    G3 Loc3     G 3 1.0000000
``````
``````DT3 <- dcast(DT2, Group + loci ~ value, value.var = c("N", "prop"), fill = 0)
``````
``````> DT3
Group loci N_A N_C N_G N_T N_NA    prop_A    prop_C    prop_G    prop_T   prop_NA
1:    G1 Loc1   0   0   3   0    0 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000
2:    G1 Loc2   1   1   0   0    1 0.3333333 0.3333333 0.0000000 0.0000000 0.3333333
3:    G1 Loc3   0   1   2   0    0 0.0000000 0.3333333 0.6666667 0.0000000 0.0000000
4:    G2 Loc1   0   0   1   2    1 0.0000000 0.0000000 0.2500000 0.5000000 0.2500000
5:    G2 Loc2   1   2   0   0    1 0.2500000 0.5000000 0.0000000 0.0000000 0.2500000
6:    G2 Loc3   0   0   2   0    2 0.0000000 0.0000000 0.5000000 0.0000000 0.5000000
7:    G3 Loc1   0   0   0   2    1 0.0000000 0.0000000 0.0000000 0.6666667 0.3333333
8:    G3 Loc2   2   1   0   0    0 0.6666667 0.3333333 0.0000000 0.0000000 0.0000000
9:    G3 Loc3   0   0   3   0    0 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000
``````
``````DT2 <- dcast(melt(DT, id="Group"), Group + variable ~ value)
``````
``````> DT2
Group variable A C G T NA
1:    G1     Loc1 0 0 3 0  0
2:    G1     Loc2 1 1 0 0  1
3:    G1     Loc3 0 1 2 0  0
4:    G2     Loc1 0 0 1 2  1
5:    G2     Loc2 1 2 0 0  1
6:    G2     Loc3 0 0 2 0  2
7:    G3     Loc1 0 0 0 2  1
8:    G3     Loc2 2 1 0 0  0
9:    G3     Loc3 0 0 3 0  0
``````
``````DT <- structure(list(Loc1 = c("G", "G", "G", NA, "G", "T", "T", "T", "T", NA),
Loc2 = c(NA, "A", "C", NA, "C", "A", "C", "A", "C", "A"),
Loc3 = c("C", "G", "G", NA, NA, "G", "G", "G", "G", "G"),
Group = c("G1", "G1", "G1", "G2", "G2", "G2", "G2", "G3", "G3", "G3")),
.Names = c("Loc1", "Loc2", "Loc3", "Group"), row.names = c(NA, -10L), class = c("data.table", "data.frame"))
``````

## r - data.table - Group data.table result by multiple columns with rounding

By : Aaron Solomon
Date : March 29 2020, 07:55 AM
I hope this helps . When using the summary syntax in data.table, i.e, not using :=, you can include columns in your result by adding the column in the list at the position j:
code :
``````mtcars2[,.(displace = round(disp / sum(disp), digits = 3), disp), by = cyl]

#    cyl displace  disp
# 1:   6    0.125 160.0
# 2:   6    0.125 160.0
# 3:   6    0.201 258.0
# 4:   6    0.175 225.0
# 5:   6    0.131 167.6
# 6:   6    0.131 167.6
# 7:   6    0.113 145.0
# ...
``````

## Calculate a group index across multiple columns of a data frame in R

By : Thandar Phru
Date : March 29 2020, 07:55 AM
To fix this issue What is the most efficient way to calculate a group index (group identifier) across multiple columns in a data frame or data.table in R? , How about this using data.table,
code :
``````library(data.table)
setDT(df)[,group :=.GRP,by = .(a,b)]
``````
``````> df
a b group
1: 1 a     1
2: 2 b     2
3: 1 c     3
4: 2 a     4
5: 1 b     5
6: 2 c     6
7: 1 a     1
8: 2 b     2
9: 1 c     3
10: 2 a     4
11: 1 b     5
12: 2 c     6
``````

## Calculate multiple columns by operating on several other columns in the same data.table

By : namcmc
Date : March 29 2020, 07:55 AM
To fix the issue you can do Probably a matter of taste, but you could use Map and build some lists to feed it.
code :
``````DT[, c("z1", "z2") := Map("*", list(x1, x2), list(y1, y2))]
``````
``````DT[, c("z1", "z2") := Map("*", mget(ls(pattern="x")), mget(ls(pattern="y")))]
``````
``````DT
x1 y1 x2 y2 z1  z2
1:  1  6 11 16  6 176
2:  2  7 12 17 14 204
3:  3  8 13 18 24 234
4:  4  9 14 19 36 266
5:  5 10 15 20 50 300
``````