Question: Strand Information On Vcf
gravatar for Jirapong
8.3 years ago by
Chiang Mai, Thailand
Jirapong20 wrote:

My mpileup output looks like this.

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    xyzzy.W1g7rI9gs2.bam
X    533    .    C    G    25    .    DP=42;VDB=0.0033;AF1=0.5;AC1=1;DP4=20,0,18,0;MQ=20;FQ=26.8;PV4=1,7.1e-22,1,1    GT:PL:GQ    0/1:55,0,60:57
X    537    .    C    T    25    .    DP=44;VDB=0.0042;AF1=0.5;AC1=1;DP4=23,0,20,0;MQ=20;FQ=26.6;PV4=1,4.3e-20,1,0.28    GT:PL:GQ    0/1:55,0,59:57

Is it possible to get strand information? or Do the VCF/BCF provide strand information?

vcf bcftools • 6.0k views
ADD COMMENTlink written 8.3 years ago by Jirapong20

I don't think strand information is relevant in a variant format, as the alleles for a variant should be given for the leading (forward, 5'-3') strand, the same direction as the reference sequence. The opposite strand sequence follows base pairing. I don't think any variant which results in imperfect base pairing is viable.

ADD REPLYlink modified 8.3 years ago • written 8.3 years ago by Michael Dondrup47k

"should be given", well, I am not sure what you are refering to (what level of generality) but sometimes they are given in both forward and reverse strand (e.g.: Comadran et al. 2012). In the VCF format it seems that nothing is really specified according to

In the case of GATK you are right "Note that REF and ALT are always given on the forward strand." From

However, this doesn't really mean that it is the case of all data from all sources, in my opinion.

ADD REPLYlink written 6.2 years ago by cpcantalapiedra140
gravatar for Pierre Lindenbaum
8.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

I don't think you can get the strand information, however, the VCF spec says that the GT field can be used to specify the phasing:

GT genotype, encoded as alleles values separated by either of ”/” or “|”, e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1 or 1|0 etc. For haploid calls, e.g. on Y, male X, mitochondrion, only one allele value should be given. All samples must have GT call information; if a call cannot be made for a sample at a given locus, ”.” must be specified for each missing allele in the GT field (for example ./. for a diploid). The meanings of the separators are:

    / : genotype unphased
    | : genotype phased

Nevertheless, I don't know the tools handling this 'phasing' property.

ADD COMMENTlink written 8.3 years ago by Pierre Lindenbaum131k

Thank you so much @Pierre. I will see if tool handle it or not.

ADD REPLYlink written 8.3 years ago by Jirapong20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1366 users visited in the last hour