Question: Recode SNPs dataset to Number
gravatar for lailan.sahrina.hsb
3.3 years ago by
lailan.sahrina.hsb0 wrote:

Dear All,

How do I recode SNPs dataset based on major allele, minor allele and heterozygote while W, N, Y nucleotide code were within the data? Usually, SNPs codes like AA, AG, AA, AA, AT, AA, GG, GG, AA, GG, AA, AA will be recoded to 0, 1, 0, 0, 1, 0, 2, 2, 0, 2, 0, 0 because A is major allele while G is minor allele. What about if I have SNPs codes like T, T, T, W, N, A, T, T, W? Previously, I used recodeSNPs function from Scrime package in R to do it. Unfortunately, it does not work for this data

ADD COMMENTlink modified 3.3 years ago by pfs280 • written 3.3 years ago by lailan.sahrina.hsb0
gravatar for pfs
3.3 years ago by
pfs280 wrote:

If I am understanding the question correctly you should be able to use 'sed' to do what you want. Below is untested but should work.

sed 's/W/0/g' file.txt | sed 's/N/1/g' file.txt | sed 's/Y/2/g' file.txt > new_file.txt

ADD COMMENTlink written 3.3 years ago by pfs280

Thank you very much for your answer. I can not encode W to 0 or others just like that because 0 is for homozygous reference and 2 for homozygous variant and 1 for heterozygous. I'm sorry, my question is not really clear. The encoding is not based on major or minor allele but reference and variant. My problem is how to decide homozygous reference and others from genotype data which is consist of nucleotide symbols not only atgc. The symbols in my data are IUPAC symbols for nucleotide of course.

ADD REPLYlink written 3.3 years ago by lailan.sahrina.hsb0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2156 users visited in the last hour