Convert genotypes format '12' to ACTG format
0
0
Entering edit mode
4 weeks ago
api • 0

Hello,

Please I need help to convert my '12' genotype format to ACGT format based on snp_info file.

Example from df (row 1)

sample1 1/2 1/2 2/1 2/1

The first SNP of sample1 is heterozygous (1/2). 1 is ref allele (A) and 2 is alt allele (G) (see snp_info).

I would like to automate the process to convert all genotypes in my real data (900 samples / 30 genotypes).

# df
df = t(data.frame( 
sample1 =c('1/2','1/2','2/1','2/1'), 
sample2 =c('2/1','1/1','1/2','2/1'), 
sample3 =c('2/1','2/1','1/1','1/2'), 
sample4 =c('1/1','2/2','2/2','2/2')))

# snp_info

snp_info = data.frame(
snp =c('11_524568','12_542656','12_558659','13_8457658'), 
position =c('524568','542656','558659','8457658'), 
ref =c('A','T','T','G'), 
alt=c('G','C','C','A'))

desired output

desired_output = t(data.frame(
sample1 =c('A/G','T/C','C/T','A/G'), 
sample2 =c('G/A','T/T','T/C','A/G'), 
sample3 =c('G/A','C/T','T/T','G/A'), 
sample4 =c('A/A','C/T','C/C','A/A')
))

I tried for loop for the first SNP but I didnt get the desired output

desired_output = t(data.frame(
sample1 =c('','','',''), 
sample2 =c('','','',''), 
sample3 =c('','','',''), 
sample4 =c('','','','')))
tp = as.data.frame(tp)

geno = list()
    for (i in 1:nrow(df)) {
    geno[i] = paste(snp_info[i,3],'/',file_info[i,4])
    desired_output[i,1] = geno[i]
    }

Thanks for help !

r dplyr • 148 views
ADD COMMENT
0
Entering edit mode
> df
        [,1]  [,2]  [,3]  [,4] 
sample1 "1/2" "1/2" "2/1" "2/1"
sample2 "2/1" "1/1" "1/2" "2/1"
sample3 "2/1" "2/1" "1/1" "1/2"
sample4 "1/1" "2/2" "2/2" "2/2"

> snp_info
         snp position ref alt
1  11_524568   524568   A   G
2  12_542656   542656   T   C
3  12_558659   558659   T   C
4 13_8457658  8457658   G   A

> nf=data.frame(matrix(NA, nrow=nrow(df), ncol = ncol(df)), row.names = row.names(df))

> for (i in seq (1:ncol(df))) {
+     nf[,i]=str_replace_all(df[, i], c("1" = snp_info$ref[i], "2" = snp_info$alt[i]))
+ }

> nf
         X1  X2  X3  X4
sample1 A/G T/C C/T A/G
sample2 G/A T/T T/C A/G
sample3 G/A C/T T/T G/A
sample4 A/A C/C C/C A/A
ADD REPLY

Login before adding your answer.

Traffic: 1239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6