Question: Understanding DiscoSNP++ output VCF file
1
gravatar for achyR
2.6 years ago by
achyR10
Paris
achyR10 wrote:

Hello

I have a small query related to the output of discoSNP++. While analyzing the vcf file generated by vcfcreator, I found multiple "genotypes", which are as follows:

.|. ./. 0|0 0/0 0|1 0/1 1|1 1/1

I was wondering if someone can help me understand what does "./." ".|." "0/0" and "0|0" means.

Thank you for your help.

snp genotype discosnp++ vcf • 828 views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by achyR10
3
gravatar for pierre.peterlongo
2.6 years ago by
France
pierre.peterlongo860 wrote:

Hi Achal, thanks for your question.

Here is an explanation (non limited to discoSnp, and adapted to diploid species).

A genotype provides a way to know for each variant if it exists in the reference allele and/or in the alternative allele.

  • with a / :
    • with a reference genome: the first value corresponds to the reference genome.
    • without a reference genome (discoSnp only): the choice of the reference versus alternative allele is random
  • with a | : the variant is phased with the previous one. The first value corresponds to the same allele than the first allele of the previous genotype. This explains why the 1|0 genotype exists.

About the values:

  • ./. the variant is not seen (missing data)
  • 0/0: homozygous variant only existing in the reference
  • 1/1: homozygous variant only existing in the alternative
  • 0/1: heterozygous variant.

Hope this helps, Pierre

ADD COMMENTlink written 2.6 years ago by pierre.peterlongo860
0
gravatar for achyR
2.6 years ago by
achyR10
Paris
achyR10 wrote:

Hello Pierre

Thank you for your reply. It was helpful. However, I am still confused in interpreting "./."

I have 50 samples listed in the .fof file. Upon completion, discoSNP++ (followed by vcfcreator) outputs a contig fasta file and a vcf file. The vcf file contains numerous rows, each corresponds to single variant, and 9 + 50 columns. These 50 columns corresponds to the variant information within 50 samples used. Now take an example row from the output vcf file:

SNP_higher_path_9480770 56 9480770 C T . . Ty=SNP;Rk=1;UL=6;UR=20;CL=.;CR=.;Genome=.;Sd=. GT:DP:PL:AD:HQ ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 0/0:11:5,37,224:11,0:66,0 ./.:1:.,.,.:1,0:68,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:5:.,.,.:0,5:0,63 ./.:0:.,.,.:0,0:0,0 1/1:1259:25184,3794,59:0,1259:0,66 ./.:0:.,.,.:0,0:0,0 1/1:43:864,134,6:0,43:0,65 1/1:38:764,119,6:0,38:0,64 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 1/1:34:684,107,6:0,34:0,66 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0

Here you see that most columns have "./." and some have "1/1".

Now my question is how should I interpret samples with genotype "./."? Should I interpret is as the contig "SNP_higher_path_9480770" is missing in this particular sample OR the contig is present but without any variation?

Hope you get my query. Thanks

ADD COMMENTlink written 2.6 years ago by achyR10

./. (for read set i): the variant whose id is SNP_higher_path_9480770 has not enough corresponding reads in the read set i.

not enough means that both alleles are not read coherent (cf read coherent definition in the publication)

Pierre

ADD REPLYlink written 2.6 years ago by pierre.peterlongo860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 701 users visited in the last hour