R reshape data
2
0
Entering edit mode
3.8 years ago
Cindy ▴ 60

Hi, everyone,

I am trying to use haplo stats to predict haplotypes using phased genotype data. I need generate a genotype matrix. my data is like

    Pedigree    SNP1    SNP2
Individual 1    C   G
Individual 1    C   G
Individual 2    C   T
Individual 2    C   T

I need to reshape it to

        SNP1.a1 SNP1.a2 SNP2.a1 SNP2.a2
Individual 1    C   C   G   G
Individual 2    C   C   T   T

Does anyone know how to reshape the data?

Many thanks in advance

R SNP • 707 views
ADD COMMENT
1
Entering edit mode
3.8 years ago
library(dplyr)
library(tidyr)
> df
      Pedigree SNP1 SNP2
1 Individual 1    C    G
2 Individual 1    C    G
3 Individual 2    C    T
4 Individual 2    C    T

df %>% 
    mutate(type=rep(c("a1","a2"), count(.)/2)) %>% 
    pivot_longer(names_to = "SNP", -c(Pedigree,type))%>%
    pivot_wider(names_from = c(SNP,type),names_sep = ".", names_sort=T)

# A tibble: 2 x 5
  Pedigree     SNP1.a1 SNP1.a2 SNP2.a1 SNP2.a2
  <chr>        <chr>   <chr>   <chr>   <chr>  
1 Individual 1 C       C       G       G      
2 Individual 2 C       C       T       T
ADD COMMENT
0
Entering edit mode

I knew that you would have a solution

ADD REPLY
0
Entering edit mode
3.8 years ago

It's more like you need reshape2::dcast() here; however, you could also try:

df <- data.frame(
  SAM = c('Individual 1', 'Individual 1', 'Individual 2', 'Individual 2'),
  SNP1 = c('C', 'C', 'C', 'C'),
  SNP2 = c('G', 'G', 'T', 'T'))
df
           SAM SNP1 SNP2
1 Individual 1    C    G
2 Individual 1    C    G
3 Individual 2    C    T
4 Individual 2    C    T

samids <- unique(df$SAM)
new <- do.call(rbind, lapply(samids, function(x) unlist(df[df$SAM == x, -1])))
rownames(new) <- samids
new

             SNP11 SNP12 SNP21 SNP22
Individual 1 "C"   "C"   "G"   "G"  
Individual 2 "C"   "C"   "T"   "T"

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6