[Edit 15 November 2017: it became apparent that the question related to the reverse and forward strand, i.e., coding/non-coding, sense/non-sense, etc. My initial answer (below) assumed that the question pertained to forward and reverse read orientation].
Strand information is initially recorded in BAMs when the reads are re-aligned to the chosen reference genome. For further information on filtering forward and reverse reads from BAMs, take a look around Biostars, particularly Samtools View: Only Forward Or Reverse Strand
Regarding the VCF, the information is not always recorded and, if it is, it may be recorded differently based on the variant caller used. Nothing new here as there are no concrete rules in bioinformatics. A good variant caller will take strand biases into account when calling variants, though, even if it may not report forward and reverse read numbers from which the variants are called.
SAMtools mpileup / BCFtools call
If you use
samtools mpileup piped into
BCFtools call, then strand orientation information is encoded with the DP4 INFO tag:
INFO=< ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases" >
If you use the GATK, using the default settings with HaplotypeCaller, you'll see an INFO tag for strand orientation in the form of an odds ratio to detect strand bias, but there's nothing on exact read numbers:
INFO=< ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias" >
I cannot comment for other variant callers, but they undoubtedly record strand orientatation in some other tags. I checked the current VCF format specification and it actually does not mention anything specific about strand orientation. It has the following for the INFO tags:
INFO - additional information: (String, no white-space, semi-colons, or equals-signs permitted; commas are permitted only as delimiters for lists of values) INFO fields are encoded as a semicolon-separated series of short keys with optional values in the format: <key>=<data> ,data]. Arbitrary keys are permitted, although the following sub-fields are reserved (albeit optional):
- AA : ancestral allele
- AC : allele count in genotypes, for each ALT allele, in the same order as listed
- AF : allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes
- AN : total number of alleles in called genotypes
- BQ : RMS base quality at this position
- CIGAR : cigar string describing how to align an alternate allele to the reference allele
- DB : dbSNP membership
- DP : combined depth across samples, e.g. DP=154
- END : end position of the variant described in this record (for use with symbolic alleles)
- H2 : membership in hapmap2
- H3 : membership in hapmap3
- MQ : RMS mapping quality, e.g. MQ=52
- MQ0 : Number of MAPQ == 0 reads covering this record
NS : Number of samples with data
SB : strand bias at this position
SOMATIC : indicates that the record is a somatic mutation, for cancer genomics
- VALIDATED : validated by follow-up experiment
- 1000G : membership in 1000 Genomes
I have put in bold the important parts.
There is nothing called as strand orientation during variant calling. By default, variants are reported in the forward strand only. That also means that there is a variation in the reverse strand in the same position as well. Variant callers don't have any information about transcript orientation as well because Variant callers use only reads that have mapped at a particular position, irrespective of which transcript those reads came from.
What you can do is annotate your vcf file using programs such as SnpEff, annovar or ensemble VEP, and associate each variant record to a gene and it's associated transcript. Now , this is a prediction because if your gene of interest is present in the forward strand and there is another gene which is present in the opposite ( reverse ) strand, you cannot be sure which transcript is being effected by the variation.
This is my understanding. Correct me anyone if I am wrong.