Question: Where can I find the reference files for the 1000 Genome project VCF data?
gravatar for caggtaagtat
18 months ago by
caggtaagtat1.1k wrote:


I just started working with VCF files and would like to use the data of the 1000 Genome project. I found that the most recent version can be downloaded at . For my functional analysis, I need the position of every exon borders. Therefore I am looking 1) for the FASTA file, which was used as a reference during SNP calling and 2) a corresponding GTF file for annotation.

1) For the FASTA files, it is stated within the VCF files, that it comes from here:

However other post of Biostar propose using the following, which is a little bit larger:

2) For the GTF files, I'm not sure if there even is one to download.

reference 1000 genome vcf • 748 views
ADD COMMENTlink modified 18 months ago by finswimmer13k • written 18 months ago by caggtaagtat1.1k
gravatar for finswimmer
18 months ago by
finswimmer13k wrote:

Hello caggtaagtat ,

I would take the reference sequence statet out the vcf file, than you can be sure to not run in any problems like different naming conventions for the chromosom.

The reference genome is GRCh37 (hg19) so you can take any annotation file for this reference genome, e.g. GENCODE. But before annotating, check how the chromsomes are named. It might be neccessary to rename them.

fin swimmer

ADD COMMENTlink written 18 months ago by finswimmer13k

Ok thank you. So I will download the GRCh37 annotation then.

ADD REPLYlink written 18 months ago by caggtaagtat1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour