logo
down
shadow

New behavior in data.table? .N / something with `by` (calculate proportion)


New behavior in data.table? .N / something with `by` (calculate proportion)

By : user2956037
Date : November 22 2020, 10:48 AM
it helps some times I updated to the latest version of data.table - 1.9.4, from a medium-recent prior version (I think 1.8.X), and now I'm getting some unexpected behavior. , Could try
code :
my_dt[, .N, by=.(type,category)][, prop:=N/sum(N), by=type][]

    type category N      prop
 1:    a    small 4 0.5000000
 2:    a      med 2 0.2500000
 3:    a    large 2 0.2500000
 4:    b      med 3 0.4285714
 5:    b    large 3 0.4285714
 6:    b    small 1 0.1428571
 7:    c    large 3 0.3000000
 8:    c    small 6 0.6000000
 9:    c      med 1 0.1000000
10:    d      med 6 0.6666667
11:    d    large 2 0.2222222
12:    d    small 1 0.1111111
13:    e    small 2 0.2500000
14:    e      med 3 0.3750000
15:    e    large 3 0.3750000


Share : facebook icon twitter icon
Calculate mean of a proportion of the data.frame

Calculate mean of a proportion of the data.frame


By : Marcin Kuzański
Date : March 29 2020, 07:55 AM
I wish this helpful for you I'm working with data that looks similar to this: , You could try something like this, data being your example dataframe:
code :
    longData<-unlist(apply(data[,c("value","n")],1,function(x){
      rep(x["value"],x["n"])      
    }))

    aggregate(longData,list(cut(seq_along(longData),breaks=3,right=FALSE)),mean)
R How to calculate a proportion of some value by column and by row in a data frame

R How to calculate a proportion of some value by column and by row in a data frame


By : sindhu kodoor
Date : March 29 2020, 07:55 AM
it should still fix some issue I would define the function as follows (you can play around with the settings)
code :
Propfunc <- function(x, dim = "col", equal = "ab", ignore = ".."){
  if(dim == "col") return(unname(colSums(x == equal)/colSums(x != ignore)))
  if(dim == "row") return(rowSums(x == equal)/rowSums(x != ignore))
  else stop("Unknown dim")
}

Propfunc(df)
## [1] 0.5 1.0 0.0
Propfunc(df, dim = "row")
## [1] 1.0 0.0 0.5
Propfunc(df, dim = "blabla")
## Error in Propfunc(df, dim = "blabla") : Unknown dim
calculate proportion in grouped data frame

calculate proportion in grouped data frame


By : JKL
Date : March 29 2020, 07:55 AM
With these it helps Assuming your N's are already your aggregated counts, you could get proportions using data.table:
code :
library(data.table)
setDT(df)[,prop:=N/sum(N),by=year]
df

   year  owngun    N       prop
1: 2000     Yes  603 0.32471729
2: 2000      No 1231 0.66289715
3: 2000 Refused   23 0.01238557
4: 2012     Yes  440 0.33716475
5: 2012      No  841 0.64444444
6: 2012 Refused   24 0.01839080
library(plyr)     
df2<-ddply(df,.(year),transform,prop=N/sum(N))
How to lookup multiple entries in table to calculate proportion?

How to lookup multiple entries in table to calculate proportion?


By : data_chris
Date : March 29 2020, 07:55 AM
Does that help Here is one way. The idea is to perform a groupby.sum() and map this onto the dataframe as part of your calculation.
code :
import pandas as pd, numpy as np

df = pd.DataFrame([['North West', 'blue city', 'city', 181],
                   ['North East', 'Black and white united', 'united', 130],
                   ['North West', 'blue and white city', 'city', 101],
                   ['North East', 'Purple United', 'united', 12],
                   ['North East', 'red city', 'city', 73],
                   ['North East', 'red and white', '', 112],
                   ['North West', 'Red city', 'city', 162],
                   ['North East', 'white shorts united', 'united', 93],
                   ['North East', 'orange and black city', 'city', 68],
                   ['North West', 'pink united', 'united', 4],
                   ['North West', 'red united', 'united', 192],
                   ['North West', 'orange united', 'united', 42]],
                  columns=['Region', 'Team', 'Suffix', 'Attending Fans'])

g = df.groupby(['Region', 'Suffix'])['Attending Fans'].sum()

df['Pct'] = 100 * df['Attending Fans'] / np.fromiter(map(g.get,
            map(tuple, df[['Region', 'Suffix']].values)), dtype=float)

#         Region                    Team  Suffix  Attending Fans         Pct
# 0   North West               blue city    city             181   40.765766
# 1   North East  Black and white united  united             130   55.319149
# 2   North West     blue and white city    city             101   22.747748
# 3   North East           Purple United  united              12    5.106383
# 4   North East                red city    city              73   51.773050
# 5   North East           red and white                     112  100.000000
# 6   North West                Red city    city             162   36.486486
# 7   North East     white shorts united  united              93   39.574468
# 8   North East   orange and black city    city              68   48.226950
# 9   North West             pink united  united               4    1.680672
# 10  North West              red united  united             192   80.672269
# 11  North West           orange united  united              42   17.647059
How to calculate the proportion of each combination in a data frame?

How to calculate the proportion of each combination in a data frame?


By : user3196380
Date : March 29 2020, 07:55 AM
I hope this helps you . Lets say we have a data.frame as following: , If M is your matrix you can do
code :
table(apply(M, 1, function(v) paste0(names(v[v==1]), collapse = ""))) / nrow(M)
> M <- cbind(A = c(1,1,1,0,0), B = c(1,0,0,1,0), C = c(1,1,1,0,1))
> table(apply(M, 1, function(v) paste0(names(v[v==1]), collapse = ""))) / nrow(M)

ABC  AC   B   C 
0.2 0.4 0.2 0.2 
Related Posts Related Posts :
  • get first entries in rows of list?
  • Conditionally removing rows from a matrix in R
  • Using a loop to find P(-1.5<Y<1.5) for a range of sample sizes
  • R-Count and list the maximum count row by row
  • Include Iverson Bracket in R documentation
  • update a data frame and environment in R
  • How to write dynamic cumulative multiple in R
  • format time using as.POSIX in R
  • Change the class of multiple columns
  • Remove period and spaces within column headings nested in a list of data frames
  • R: error message --- package error: "functionName" not resolved from current namespace
  • labels with geom_text ggplot2
  • Passing mongodb ISODate in R
  • Importing "csv" file with multiple-character separator to R?
  • Change row names of a table obtained from a lm regression using xtable function
  • R language iterate over R object
  • How do you delete the header in a dataframe?
  • Re coding in R using complicated statement
  • accumulating functions and closures in R
  • How do you combine two columns into a new column in a dataframe made of two or more different csv files?
  • Twitter authentication fails
  • Summing Values of One Vector Conditional on Values of Another Vector
  • draw cube into 3D scatterplot in RGL
  • lme4 translate formula to code in 3-level model
  • How to draw single axis plot in R
  • Combine geom_tile() and facet_grid/facet_wrap and remove space between tiles (ggplot2)
  • Use snpStats with R version 3.0.1
  • Makefile gives strange error while compiling markdown file into .docx file
  • How to determine whether a points lies in an ellipse
  • Summarize data already grouped in r
  • Is the bigvis package for R not available for R version 3.0.1?
  • Operator overloading in R reference classes
  • How to enable user to switch between ggplot2 and gVis graphs in R Shiny?
  • Is there an easy way to separate categorical vs continuous variables into two dataset in R
  • Correct previous year by id within R
  • Installation of rdyncall package for R
  • ggplot2 plot that evaluates the percentage and mean of a third variable at intersecting points
  • Error Handling with Lapply
  • data.table - split multiple columns
  • How to compute the overall mean for several files in R?
  • R: Graph Plotting: Subscripts in the legend like LaTeX
  • Restructuring data in R
  • Distance of pointsfrom cluster centers after K means clustering
  • R incorrect value of date function
  • Package "Imports" not loading in R development package
  • r - run a user defined function several times by taking column elements as parameters
  • Create input$selection to subset data AND radiobuttons to choose plot type in Shiny
  • Generate crosstabulations from dataframe of categorical variables in survey
  • Restructure output of R summary function
  • search certain number vector in R
  • R version doesn't support quartz graphic device - RStudio won't plot
  • Referencing a function parameter in R
  • How to synchronize signals using a cross-correlation and FFT in R?
  • Plotting coefficients and corresponding confidence intervals
  • passing expressions to curve() within a function
  • More effective merging of matched column with duplicates in data.table
  • Easy way to export multiple data.frame to multiple Excel worksheets
  • R Foreach Iterator - Walkforward
  • Table format and output in R
  • Restructuring data and duplicating rows in R
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org