Question: Haplotype annotation in VCF file of phase 3 1000 Genome project
0
gravatar for caggtaagtat
11 weeks ago by
caggtaagtat470
caggtaagtat470 wrote:

Hi there,

I'm new to vcf file analysis and would like to download a huge database for human SNPs with information about the location, sequence variation and if it is possible to be homozygous.

So far I found this directory for files of the 1000 genome project where I think I can download the relevant data. However, I'm not sure if I look at the right columns.

The data looks like this:

22      16050654        esv3647175;esv3647176;esv3647177;esv3647178     A       <CN0>,<CN2>,<CN3>,<CN4> 100     PASS    AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV       GT      3|0     0|0     0|0     0|0     0|0     0|0     0|4     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0         0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     3|0     0|0     3|0     0|0     0|0     3|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|3     0|0     0|4     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|3     3|0     0|3     2|0     0|0     0|0     ...

Other Entries only show 0|0, 0|1 1|0, so I initially thought the numbers would indicate the haplotype of the SNP in different individuals. However, I don't understand the difference between 0|2 and 3|0 then.

Edit: I have to add, that there is no documentation of these columns in the vcf file header

snp haplotype vcf • 143 views
ADD COMMENTlink modified 11 weeks ago by finswimmer11k • written 11 weeks ago by caggtaagtat470
3
gravatar for finswimmer
11 weeks ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello,

the numbers describe which REF or ALTs are present in the sample. 0 means a REF base and values greater indicates the position in the ALT column.

So a sample with a genotype 0|0 is homozygous for the reference allel. A sample with 0|2 have one reference allele and the second allele correspond to the second value in the ALT column. A sample with 0|3 have one reference allele and the second allele correspond to the 3 value in the ALT column.

The | indicates that the variants are phased. So all variants of the same chromosome assigned in front of the | are located on the same allele and those behind on the other. If phasing is unknown the delimiter would be /.

fin swimmer

ADD COMMENTlink written 11 weeks ago by finswimmer11k

Thank you very much! That helps a lot. So when I'm looking for SNPs which can occur homozygous, I would check for at least one entry with n|n or n/n with n > 0 ?

ADD REPLYlink written 11 weeks ago by caggtaagtat470
1

So when I'm looking for SNPs which can occur homozygous, I would check for at least one entry with n|n or n/n with n > 0

Yes.

ADD REPLYlink written 11 weeks ago by finswimmer11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 783 users visited in the last hour