Entering edit mode
4.4 years ago
api
▴
20
Hello,
Please I need help to convert my '12' genotype format to ACGT format based on snp_info file.
Example from df (row 1)
sample1 1/2 1/2 2/1 2/1
The first SNP of sample1 is heterozygous (1/2). 1 is ref allele (A) and 2 is alt allele (G) (see snp_info).
I would like to automate the process to convert all genotypes in my real data (900 samples / 30 genotypes).
# df
df = t(data.frame(
sample1 =c('1/2','1/2','2/1','2/1'),
sample2 =c('2/1','1/1','1/2','2/1'),
sample3 =c('2/1','2/1','1/1','1/2'),
sample4 =c('1/1','2/2','2/2','2/2')))
# snp_info
snp_info = data.frame(
snp =c('11_524568','12_542656','12_558659','13_8457658'),
position =c('524568','542656','558659','8457658'),
ref =c('A','T','T','G'),
alt=c('G','C','C','A'))
desired output
desired_output = t(data.frame(
sample1 =c('A/G','T/C','C/T','A/G'),
sample2 =c('G/A','T/T','T/C','A/G'),
sample3 =c('G/A','C/T','T/T','G/A'),
sample4 =c('A/A','C/T','C/C','A/A')
))
I tried for loop for the first SNP but I didnt get the desired output
desired_output = t(data.frame(
sample1 =c('','','',''),
sample2 =c('','','',''),
sample3 =c('','','',''),
sample4 =c('','','','')))
tp = as.data.frame(tp)
geno = list()
for (i in 1:nrow(df)) {
geno[i] = paste(snp_info[i,3],'/',file_info[i,4])
desired_output[i,1] = geno[i]
}
Thanks for help !
0
Entering edit mode
ADD REPLY
• link
4.4 years ago by
cpad0112
21k