Skip to content Skip to sidebar Skip to footer

Couple The Data In All Possible Combinations

I have data in column in two columns like this Id Value 1 a 2 f 1 c 1 h 2 a and I'd like couple the data of the 'Value' column in all possible combinations based on the

Solution 1:

To return a character matrix with these combinations using base R, try

do.call(rbind, t(sapply(split(df, df$Id), function(i) t(combn(i$Value, 2)))))
     [,1] [,2]
[1,] "a""c" 
[2,] "a""h" 
[3,] "c""h" 
[4,] "f""a"

Each row is a desired combination.

To break this down a bit, split splits the data.frame by Id into a list of two data.frames. Then sapply is fed this list and the combn function to find the pairwise combinations within these data.frames. The result from each data.frame (which is a matrix) is transposed to fit your desired structure using t. Finally, this list of matrices is fed to do.call which uses rbind to return the final matrix.

Note: There is an assumption that the value column is character (not the pesky factor variable type). This is easily accomplished in the read. family of functions, like read.csv and read.table by adding the as.is=TRUE argument to your read function (or the longer stringsAsFactors=FALSE). If the variable is already a factor, you can wrap the i$Value statement near the end in as.character: as.character(i$Value) and it will run as desired.

Solution 2:

Using R you could try:

library(purrr)

df %>%
  split(.$Id) %>%
  map(~ t(combn(.$Value, 2)))

Which gives:

#$`1`#     [,1] [,2]#[1,] "a""c"#[2,] "a""h"#[3,] "c""h"##$`2`#     [,1] [,2]#[1,] f    a#Levels: a c f h

Solution 3:

Just another way (possibly slightly faster as it exploits the fact that you're looking for all pairs, and avoids combn and t):

require(data.table)
dt[, .( c1 =rep(Value,(.N:1)-1L), c2 =rep(Value,(1:.N)-1L)), by=Id]#    Id c1 c2# 1:  1  a  c# 2:  1  a  h# 3:  1  c  h# 4:  2  f  a

.N contains the number of observations for each group.


where dt is:

dt = fread('Id  Value
1   a
2   f
1   c
1   h
2   a')

Post a Comment for "Couple The Data In All Possible Combinations"