Question: Replace REF and ALT columns based on the GT of one sample in a VCF file
0
gravatar for charlesberkn
3.4 years ago by
charlesberkn20
United States
charlesberkn20 wrote:

Hi, 
I have a VCF file with three samples that looks something like this:

#CHROM POS     ID        REF    ALT     QUAL FILTER INFO                              FORMAT      NA00001        NA00002        NA00003
20     14370   rs6054257 G      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20     17330   .         T      A       3    q10    NS=3;DP=11;AF=0.017               GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3   0/0:41:3
20     1110696 rs6040355 A      G,T     67   PASS   NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2   2/2:35:4

I want to replace REF and ALT columns based on the genotype (GT) of one sample. For example, I replace REF and ALT columns based on the GT of the third 

sample (NA00003), and, after replacing, I have:

#CHROM POS     ID        REF    ALT     QUAL FILTER INFO                              FORMAT      NA00001        NA00002        NA00003
20     14370   rs6054257 A      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20     17330   .         T      T       3    q10    NS=3;DP=11;AF=0.017               GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3   0/0:41:3
20     1110696 rs6040355 T      T       67   PASS   NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2   2/2:35:4

Can anybody help me get this done? Thank you.

next-gen vcf • 1.4k views
ADD COMMENTlink modified 3.4 years ago by Pierre Lindenbaum118k • written 3.4 years ago by charlesberkn20

why would you want to do that ?

ADD REPLYlink written 3.4 years ago by Pierre Lindenbaum118k

We are developing a statistical model to infer the true genotypes based on several samples.  

ADD REPLYlink written 3.4 years ago by charlesberkn20
2

then you'll have to play around with the genotype information, and not with the variant definition. if you change the REF and ALT information then all the samples' genotype information won't make any sense, at least in the VCF format context.

ADD REPLYlink written 3.4 years ago by Jorge Amigo11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1069 users visited in the last hour