Array variant to reference-standard way
0
0
Entering edit mode
11 months ago

Dear community members,

I have an Illumia array and after transformation to VCF it looks like (one line as an example)

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NAME    
1   752721  rs3131972   C   T   .   .   PR  GT  0/1

Now I need to extract information about these variants from a large cohort of WGS samples.

The problem is - C is not actually REF allele for this variant ( https://www.ncbi.nlm.nih.gov/snp/rs3131972?horizontal_tab=true ). For some variants REF is actually REF, but for half they are switched.

When I look this variant in array specs, I see a line

rs3131972-138_T_R_2263598533,rs3131972,TOP,[A/G],0060710106,AACGTTCACTTTCTGTCTGTGTTCACGTCACCAAGAGAATAGAAAGGAAA,,,37,1,752721,diploid,Homo sapiens,dbSNP,138,BOT,GCCTGGACTGGAGGGCTGTCTCAAGGAGGGTGACGTGTCTTTGACTTTTGCATTCTTCCC[T/C]TTTCCTTTCTATTCTCTTGGTGACGTGAACACAGACAGAAAGTGAACGTTTTTTGCATAA,TTATGCAAAAAACGTTCACTTTCTGTCTGTGTTCACGTCACCAAGAGAATAGAAAGGAAA[A/G]GGGAAGAATGCAAAAGTCAAAGACACGTCACCCTCCTTGAGACAGCCCTCCAGTCCAGGC,1897,3,0,+

so the variant here is even A/G.

Is there a way to normalize a VCF to reference, to fix REF/ALT? I am absolutely lost since I supposed it to be a very simple procedure but it seems very complex. I can't rely even on rs-IDs - they are missing for many array variants.

genotyping • 227 views
ADD COMMENT

Login before adding your answer.

Traffic: 638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6