Question: Identify recessive conditions from VCF files
gravatar for andrewl
3.2 years ago by
andrewl10 wrote:

Assuming I have the following two lines in a VCF, both of which code for a pathogenic variation in the MASP1 gene:

chr3 187235874 . C T 338.10 PASS AB=0.611111;ABP=4.9405;AC=2;ADP=16;AF=0.5;AN=4;AO=11;CIGAR=1X;DP=18;DPB=18;DPRA=0;EPP=3.20771;EPPR=3.32051;GTI=0;HET=1;HOM=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NC=0;NS=1;NUMALT=1;ODDS=47.5715;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=437;QR=294;RO=7;RPP=4.78696;RPPR=3.32051;RUN=1;SAF=6;SAP=3.20771;SAR=5;SF=0,1;SRF=6;SRP=10.7656;SRR=1;TYPE=snp;WT=0 GT:ABQ:DP:ADR:GQ:ADF:RO:RDF:AD:GL:RDR:QA:RD:SDP:AO:PVAL:QR:RBQ:FREQ 0/1:.:18:.:.:.:7:.:.:-10,0,-10:.:437:.:.:11:.:294


chr3 187236382 . G A 351.81 PASS AB=0.611111;ABP=4.9405;AC=2;ADP=15;AF=0.5;AN=4;AO=11;CIGAR=1X;DP=18;DPB=18;DPRA=0;EPP=3.20771;EPPR=10.7656;GTI=0;HET=1;HOM=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NC=0;NS=1;NUMALT=1;ODDS=45.4512;PAIRED=0.909091;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=457;QR=294;RO=7;RPP=4.78696;RPPR=10.7656;RUN=1;SAF=8;SAP=7.94546;SAR=3;SF=0,1;SRF=5;SRP=5.80219;SRR=2;TYPE=snp;WT=0 GT:FREQ:QR:RBQ:AO:PVAL:QA:SDP:RD:RDR:GL:ADF:RDF:RO:AD:GQ:ADR:ABQ:DP 0/1:.:294:.:11:.:457:.:.:.:-10,0,-10:.:.:7:.:.:.:.:18

This is bad news if one mutation is located in one allele and the other in the other allele of the patient's chromosome. Is there a way to tell this from a VCF file, or is it necessary at the point of finding multiple occurrences such as these to go back BAM files to figure out whether one copy of a gene is twice damaged or there are two damaged genes?

recessive vcf • 1.2k views
ADD COMMENTlink modified 3.2 years ago by Vincent Laufer1.1k • written 3.2 years ago by andrewl10

So, essentially, you want to phase your variants to find which are on the same chromosome?

There are tools for that. Now you know the right terminology and you can try some googling ;-)
But obviously, with short reads you are quite limited. If reads don't span from SNP A to SNP B it gets hard.

ADD REPLYlink written 3.2 years ago by WouterDeCoster44k
gravatar for Vincent Laufer
3.2 years ago by
Vincent Laufer1.1k
United States
Vincent Laufer1.1k wrote:

Andrew, first a bit of terminology, you are essentially asking if this individual is a compound heterozygote or not. Can also look up double heterozygote etc.; these terms will probably prove useful to you in identifying other key concepts.

At any rate, unless there is a single read containing both variants, you will not be able to tell from the BAM file directly without doing analysis. The type of analysis necessary is called phasing; using that terminology you are asking whether or not these two variants are in phase (on the same chromosome) or out of phase (compound heterozygote).

There are very well written programs that have already been written to phase genetic data, but depending on factors like read depth, repeat sequence content, and distance between 2 variants it is sometimes not possible to phase variants.

In your case, since these variants are not very far apart (~500bases) it is very likely you will be able to phase them and is even possible you might have one read containing both variants on it. For the latter possibility, you should check the BAM file as you asked, for the former you would look up phasing software, decide which one is right for you, and run.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Vincent Laufer1.1k

Thanks - this is great. I only have access to the VCF, so as I would understand from your answer, it is not possible from that file alone to perform any analysis to determine phase. Correct? I guess I was curious if there was anything in the metadata of a VCF file that helped for this ....

ADD REPLYlink written 3.2 years ago by andrewl10

It IS possible to determine phase for some variants using a VCF file.

Many phasing algorithms, for example the BEAGLE suite, will actually take a VCF as the input file without modification.

All of the major options for phasing are pretty good, however for one or the other application some are preferred over others. If you are only interested in these 2 variants that are <1kb apart, can probably use anything; if your interest could conceivably extend beyond that, could be worth the time to investigate a couple of them.

ADD REPLYlink written 3.2 years ago by Vincent Laufer1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1398 users visited in the last hour