What Does Genetype ("0/0", "0/1" Or "1/1") In *.Vcf File Represent?
5
3
Entering edit mode
8.5 years ago
lv06025158 ▴ 30

Dear everyone,

Recently I was analyzing some NGS data and genome polymorphysm. Though Pindel, I got the insertions and deletions of NGS against Reference genome. After Pindel2vcf the vcf format files were available. Here is my question: What dose the genetype ("0/0", "0/1" or "1/1") in the vcf file represent separately?

For example,

chr10_irgsp5    2279161    .    CA    C    .    PASS    END=2279162;HOMLEN=9;HOMSEQ=AAAAAAAAA;SVLEN=-1;SVTYPE=DEL    GT:AD    0/0:3,3    0/0:0,0

chr10_irgsp5    2313030    .    CA    C    .    PASS    END=2313031;HOMLEN=10;HOMSEQ=AAAAAAAAAA;SVLEN=-1;SVTYPE=DEL    GT:AD    0/0:1,2    0/0:2,2

chr10_irgsp5    2588340    .    GTA    G    .    PASS    END=2588342;HOMLEN=3;HOMSEQ=TAT;SVLEN=-2;SVTYPE=DEL    GT:AD    0/0:0,1    0/1:8,8


Thanks very much!

Yang

pindel vcf • 26k views
9
Entering edit mode
8.5 years ago
Peixe ▴ 650

Just point out that if the bar is | means that the genotypes have been phased, that is have been assigned to which chromosome comes each allele. If the bar is like / means it has not been phased.

8
Entering edit mode
8.5 years ago
matted 7.6k

From the VCF (Variant Call Format) version 4.1 page:

GT : genotype, encoded as allele values separated by either of ”/” or “|”. The allele values are 0 for the reference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1, 1|0, or 1/2, etc.

1
Entering edit mode

Good call - I was about to clarify that is what the genotype means in general, but I am actually a bit confused in this specific case: I don't know what pindel would give 0/0 indel calls.

Nevertheless, you can confirm if the indel is heterozygous (0/1) or homozygous (1/1) by checking the alignment (in addition to checking the reliability of the call)

4
Entering edit mode
4.7 years ago
ATCG ▴ 370

GT, genotype, encodes alleles as numbers: 0 for the reference allele, 1 for the first allele listed in ALT column, 2 for the second allele listed in ALT and so on. The number of alleles suggests ploidy of the sample and the separator indicates whether the alleles are phased (‘|’) or unphased (‘/’) with respect to other data lines (Fig. 1).

see details:

Bioinformatics. 2011 Aug 1;27(15):2156-8. doi: 10.1093/bioinformatics/btr330. Epub 2011 Jun 7. The variant call format and VCFtools. should help.

1
Entering edit mode
4.7 years ago
ATCG ▴ 370

Relevant paper:

0
Entering edit mode
8.5 years ago

I don't remember off the top of my head, but I think you should be able to figure it out by looking at the pindel output files (prior to creation of the .vcf file)..

There are alignments for every indel that show every read supporting the indel (which you can compare to the total number of reads).

Alternatively, you can always check your alignment using an indexed .bam file in IGV. This is probably a good idea no matter what: unless you are working with a small indel (which would have been identified using an ordinary variant calling tool like VarScan anyways), structural variant calls are hard to make and will often include false positives.