logo
down
shadow

how to check if each cells (list) of a column of a dataframe are unique in R?


how to check if each cells (list) of a column of a dataframe are unique in R?

By : Felipehamm
Date : November 18 2020, 01:01 AM
Hope this helps Here's an alternative approach, related to the idea of starting with a "long" dataset and proceeding from there.
This is your long dataset.
code :
library(splitstackshape)
x <- cSplit(df1, "book_id", ",", "long")[, book_id := gsub(
    "[][]", "", book_id)]
x[, duped := paste(unique(book_id[duplicated(book_id)], 
                   collapse = ", ")), by = id]
dupedX <- x[, list(book_id = sprintf("[%s]", paste(book_id, collapse = ", ")),
                   duped = paste(unique(duped), collapse = ", ")), by = id]
dupedX
#    id                                book_id      duped
# 1:  1 ["19167120", "237494310", "195166798"]         NA
# 2:  2  ["19167120", "237494310", "19167120"] "19167120"
# 3:  3                                     []         NA
uniqueX <- x[, list(book_id = sprintf(
  "[%s]", paste(unique(book_id), collapse = ", "))), by = id]
uniqueX
#    id                                book_id
# 1:  1 ["19167120", "237494310", "195166798"]
# 2:  2              ["19167120", "237494310"]
# 3:  3                                     []


Share : facebook icon twitter icon
Cells in a column of pandas dataframe are individual list. dtype list is not working

Cells in a column of pandas dataframe are individual list. dtype list is not working


By : tvg
Date : March 29 2020, 07:55 AM
wish of those help Well this is not the correct way but I found a hack. Use graphlab sframe to read the column with dtype as list and then convert that sframe to dataframe with .to_dataframe command.
python pandas: Check if dataframe's column value is in another dataframe's column, then count and list it

python pandas: Check if dataframe's column value is in another dataframe's column, then count and list it


By : kalk
Date : March 29 2020, 07:55 AM
Does that help Learning Python here, and any help on this is much appreciated. I have a two-part problem, and although I have created a solution to the first part, there has to be a much more pythonic way to accomplish the goal. The second part, not so sure how to proceed. , You can use merge
code :
df.merge(df1, on = 'uid', how = 'left').fillna('')

    uid value   value1
0   uid1    1   
1   uid2    2   5
2   uid3    3   
3   uid4    4   
list_val_in_both_df  = list(set(df.uid).intersection(set(df1.uid)))
['uid2', 'uid4']
R extract unique row values in a column in a dataframe in a list

R extract unique row values in a column in a dataframe in a list


By : user2931794
Date : March 29 2020, 07:55 AM
I wish this help you We could use lapply, find out the duplicated indices in X2 column and print the unique duplicated values.
code :
lapply(list_df, function(x) {
   inds <- duplicated(x$X2)
   if(any(inds)) unique(x$X2[inds]) else "No duplicates"
})

#[[1]]
#[1] "No duplicates"

#[[2]]
#[1] 2

#[[3]]
#[1] "No duplicates"
List rows which column value is not unique in dataframe

List rows which column value is not unique in dataframe


By : hardtani
Date : March 29 2020, 07:55 AM
To fix this issue I have a dataframe where some of the SongIds are repeated. I would like to extract those rows which have the repetition. Any idea how? Tried: , try this,
code :
df=pd.DataFrame({"Song ID":[0,0,1,3,1,4,5],'ArtistID':[12,13,34,1,21,43,22]})
print df[df.duplicated(subset=['Song ID'],keep=False)]
   Song ID  value
0        0     12
1        0     13
2        1     34
4        1     21
pandas DataFrame: get cells in column that are NaN, None, empty string/list, etc

pandas DataFrame: get cells in column that are NaN, None, empty string/list, etc


By : Irie Blue
Date : August 23 2020, 06:00 AM
Any of those help One idea is chain Series.isna with compare lengths by Series.str.len:
code :
df = pd.DataFrame({
         'a':[None,np.nan,[],'','aa', 0],
})

m = df['a'].isna() | df['a'].str.len().eq(0)
print (m)
0     True
1     True
2     True
3     True
4    False
5    False
Name: a, dtype: bool
Related Posts Related Posts :
  • How can I get my points to connect in a plot and show a trend with NA values in data?
  • Read SPecific lines of a CSV file in R-language
  • ggplot stacked bar plot from 2 separate data frames
  • auto.arima not parallelizing
  • Histogram of binned data frame in R
  • R rewriting stringmanipulations implemented in loop to the R-way
  • get first entries in rows of list?
  • Conditionally removing rows from a matrix in R
  • Using a loop to find P(-1.5<Y<1.5) for a range of sample sizes
  • R-Count and list the maximum count row by row
  • Include Iverson Bracket in R documentation
  • update a data frame and environment in R
  • How to write dynamic cumulative multiple in R
  • format time using as.POSIX in R
  • Change the class of multiple columns
  • Remove period and spaces within column headings nested in a list of data frames
  • R: error message --- package error: "functionName" not resolved from current namespace
  • labels with geom_text ggplot2
  • Passing mongodb ISODate in R
  • Importing "csv" file with multiple-character separator to R?
  • Change row names of a table obtained from a lm regression using xtable function
  • R language iterate over R object
  • How do you delete the header in a dataframe?
  • Re coding in R using complicated statement
  • accumulating functions and closures in R
  • How do you combine two columns into a new column in a dataframe made of two or more different csv files?
  • Twitter authentication fails
  • Summing Values of One Vector Conditional on Values of Another Vector
  • draw cube into 3D scatterplot in RGL
  • lme4 translate formula to code in 3-level model
  • How to draw single axis plot in R
  • Combine geom_tile() and facet_grid/facet_wrap and remove space between tiles (ggplot2)
  • Use snpStats with R version 3.0.1
  • Makefile gives strange error while compiling markdown file into .docx file
  • How to determine whether a points lies in an ellipse
  • Summarize data already grouped in r
  • Is the bigvis package for R not available for R version 3.0.1?
  • Operator overloading in R reference classes
  • How to enable user to switch between ggplot2 and gVis graphs in R Shiny?
  • Is there an easy way to separate categorical vs continuous variables into two dataset in R
  • Correct previous year by id within R
  • Installation of rdyncall package for R
  • ggplot2 plot that evaluates the percentage and mean of a third variable at intersecting points
  • Error Handling with Lapply
  • data.table - split multiple columns
  • How to compute the overall mean for several files in R?
  • R: Graph Plotting: Subscripts in the legend like LaTeX
  • Restructuring data in R
  • Distance of pointsfrom cluster centers after K means clustering
  • R incorrect value of date function
  • Package "Imports" not loading in R development package
  • r - run a user defined function several times by taking column elements as parameters
  • Create input$selection to subset data AND radiobuttons to choose plot type in Shiny
  • Generate crosstabulations from dataframe of categorical variables in survey
  • Restructure output of R summary function
  • New behavior in data.table? .N / something with `by` (calculate proportion)
  • search certain number vector in R
  • R version doesn't support quartz graphic device - RStudio won't plot
  • Referencing a function parameter in R
  • How to synchronize signals using a cross-correlation and FFT in R?
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org