Question: Where can I find the reference files for the 1000 Genome project VCF data?
gravatar for caggtaagtat
10 weeks ago by
caggtaagtat470 wrote:


I just started working with VCF files and would like to use the data of the 1000 Genome project. I found that the most recent version can be downloaded at . For my functional analysis, I need the position of every exon borders. Therefore I am looking 1) for the FASTA file, which was used as a reference during SNP calling and 2) a corresponding GTF file for annotation.

1) For the FASTA files, it is stated within the VCF files, that it comes from here:

However other post of Biostar propose using the following, which is a little bit larger:

2) For the GTF files, I'm not sure if there even is one to download.

reference 1000 genome vcf • 144 views
ADD COMMENTlink modified 10 weeks ago by finswimmer11k • written 10 weeks ago by caggtaagtat470
gravatar for finswimmer
10 weeks ago by
finswimmer11k wrote:

Hello caggtaagtat ,

I would take the reference sequence statet out the vcf file, than you can be sure to not run in any problems like different naming conventions for the chromosom.

The reference genome is GRCh37 (hg19) so you can take any annotation file for this reference genome, e.g. GENCODE. But before annotating, check how the chromsomes are named. It might be neccessary to rename them.

fin swimmer

ADD COMMENTlink written 10 weeks ago by finswimmer11k

Ok thank you. So I will download the GRCh37 annotation then.

ADD REPLYlink written 10 weeks ago by caggtaagtat470
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1733 users visited in the last hour