Question: Where can I find the reference files for the 1000 Genome project VCF data?
0
gravatar for caggtaagtat
7 days ago by
caggtaagtat390
caggtaagtat390 wrote:

Hello,

I just started working with VCF files and would like to use the data of the 1000 Genome project. I found that the most recent version can be downloaded at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ . For my functional analysis, I need the position of every exon borders. Therefore I am looking 1) for the FASTA file, which was used as a reference during SNP calling and 2) a corresponding GTF file for annotation.

1) For the FASTA files, it is stated within the VCF files, that it comes from here: ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

However other post of Biostar propose using the following, which is a little bit larger: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

2) For the GTF files, I'm not sure if there even is one to download.

reference 1000 genome vcf • 73 views
ADD COMMENTlink modified 7 days ago by finswimmer8.9k • written 7 days ago by caggtaagtat390
2
gravatar for finswimmer
7 days ago by
finswimmer8.9k
Germany
finswimmer8.9k wrote:

Hello caggtaagtat ,

I would take the reference sequence statet out the vcf file, than you can be sure to not run in any problems like different naming conventions for the chromosom.

The reference genome is GRCh37 (hg19) so you can take any annotation file for this reference genome, e.g. GENCODE. But before annotating, check how the chromsomes are named. It might be neccessary to rename them.

fin swimmer

ADD COMMENTlink written 7 days ago by finswimmer8.9k

Ok thank you. So I will download the GRCh37 annotation then.

ADD REPLYlink written 7 days ago by caggtaagtat390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1184 users visited in the last hour