how to add reference alleles to VCF?
1
0
Entering edit mode
3.6 years ago
dec986 ▴ 370

I’m converting gVCFs to VCF, but the reference alleles are missing. An example below:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  180525_FD02929177
1   97547947    .   T   .   .   .   DP=31   GT:DP:RGQ   0/0:31:81
1   97915614    .   C   .   .   .   DP=40   GT:DP:RGQ   0/0:40:99
1   97981343    .   A   .   .   .   DP=43   GT:DP:RGQ   0/0:43:99
2   234668570   .   C   T   539.64  .   AC=1;AF=0.500;AN=2;ClippingRankSum=0.340;
DP=32;ExcessHet=3.0103;FS=5.748;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=16.86;RAW_MQ=115200.00;SOR=0.150    G
T:AD:DP:GQ:PL   0/1:17,15:32:99:547,0,586
2   234669144   .   G   .   .   .   DP=36   GT:DP:RGQ   0/0:36:99

which was made by break_blocks:

break_blocks --region-file /illumina/runs/con/concordance/fluidigm/fluidigm_positions.tab.bed --ref human_g1k_v37.fasta --exclude-off-target

I’m using GATK thus:

gatk --java-options "-Xmx4g" GenotypeGVCFs \
     -R /illumina/runs/con/g1k_v37/human_g1k_v37.fasta \
     -V fluidigm.gvcf.202009/HG00099.fluidigm.202009.g.vcf \
     -O fluidigm.vcf.202009/HG00099.fluidigm.202009.vcf \
     --allow-old-rms-mapping-quality-annotation-data \
     --include-non-variant-sites

But none of the options in GATK seem to allow adding reference alleles to the REF column, everything is just .. When I try this manually with a Perl script, there are missing data, so programming it myself can’t work.

Do you know how I can add the reference alleles to VCF/gVCF?

vcf genome • 1.7k views
ADD COMMENT
1
Entering edit mode

Please do not delete a question after it has been addressed in some way. Eyeballing columns wrong is a common problem and someone else could benefit from your experience.

Please accept my answer below using the green check mark on the left.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Heys! I'm having the exact same problem! Did you solve it? I would really appreciate it!

ADD REPLY
3
Entering edit mode
3.6 years ago
Ram 43k

I don't see any entry with a missing REF. Could it be that you're visually matching the ID column in the header to the REF column in the data?

See below:

#CHROM  POS        ID  REF  ALT  QUAL    FILTER  INFO               FORMAT          180525_FD02929177
1       97547947   .   T    .    .       .       DP=31              GT:DP:RGQ       0/0:31:81
1       97915614   .   C    .    .       .       DP=40              GT:DP:RGQ       0/0:40:99
1       97981343   .   A    .    .       .       DP=43              GT:DP:RGQ       0/0:43:99
2       234668570  .   C    T    539.64  .       AC=1;AF=0.500;...  GT:AD:DP:GQ:PL  0/1:17,15:32:99:547,0,586
2       234669144  .   G    .    .       .       DP=36              GT:DP:RGQ       0/0:36:99
ADD COMMENT
0
Entering edit mode

How is that the quality column is empty although there is coverage for it? How could you add that information Ram ? I would really appreciate some help!

ADD REPLY
0
Entering edit mode

That's a different question - please search the forum and if you don't find a satisfactory answer, open a new question.

Given that this data is from a different person, odds are only they'll know why the QUAL doesn't have data - it's probably an accepted norm in gVCF. (I did a quick google and see something pertinent on this link: https://support.illumina.com/help/BS_App_TSA_help/Content/Vault/Informatics/Sequencing_Analysis/BS/swSEQ_mBS_gVCF.htm)

ADD REPLY

Login before adding your answer.

Traffic: 1707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6