Question: Haplotype annotation in VCF file of phase 3 1000 Genome project
0
gravatar for caggtaagtat
13 days ago by
caggtaagtat390
caggtaagtat390 wrote:

Hi there,

I'm new to vcf file analysis and would like to download a huge database for human SNPs with information about the location, sequence variation and if it is possible to be homozygous.

So far I found this directory for files of the 1000 genome project where I think I can download the relevant data. However, I'm not sure if I look at the right columns.

The data looks like this:

22      16050654        esv3647175;esv3647176;esv3647177;esv3647178     A       <CN0>,<CN2>,<CN3>,<CN4> 100     PASS    AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV       GT      3|0     0|0     0|0     0|0     0|0     0|0     0|4     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0         0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     3|0     0|0     3|0     0|0     0|0     3|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|3     0|0     0|4     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|3     3|0     0|3     2|0     0|0     0|0     ...

Other Entries only show 0|0, 0|1 1|0, so I initially thought the numbers would indicate the haplotype of the SNP in different individuals. However, I don't understand the difference between 0|2 and 3|0 then.

Edit: I have to add, that there is no documentation of these columns in the vcf file header

snp haplotype vcf • 69 views
ADD COMMENTlink modified 13 days ago by finswimmer8.9k • written 13 days ago by caggtaagtat390
3
gravatar for finswimmer
13 days ago by
finswimmer8.9k
Germany
finswimmer8.9k wrote:

Hello,

the numbers describe which REF or ALTs are present in the sample. 0 means a REF base and values greater indicates the position in the ALT column.

So a sample with a genotype 0|0 is homozygous for the reference allel. A sample with 0|2 have one reference allele and the second allele correspond to the second value in the ALT column. A sample with 0|3 have one reference allele and the second allele correspond to the 3 value in the ALT column.

The | indicates that the variants are phased. So all variants of the same chromosome assigned in front of the | are located on the same allele and those behind on the other. If phasing is unknown the delimiter would be /.

fin swimmer

ADD COMMENTlink written 13 days ago by finswimmer8.9k

Thank you very much! That helps a lot. So when I'm looking for SNPs which can occur homozygous, I would check for at least one entry with n|n or n/n with n > 0 ?

ADD REPLYlink written 13 days ago by caggtaagtat390
1

So when I'm looking for SNPs which can occur homozygous, I would check for at least one entry with n|n or n/n with n > 0

Yes.

ADD REPLYlink written 13 days ago by finswimmer8.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1724 users visited in the last hour