Question: Download latest reference genome assembly for exome sequencing alignment and variant calling
0
gravatar for svlachavas
3 months ago by
svlachavas480
Greece
svlachavas480 wrote:

Dear Community,

i would like to search and download the latest possible human reference genome assembly hg38/GRCh38, in order to use it both in the process of sequence alignment of raw reads, as also for variant calling concerning exome sequencing. However, I'm a bit confused about the available options and the different sources, such as UCSC and NCBI. In detail:

1) If i want to download the latest reference genome human assembly available, then this would be the option : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12 ?

and specifically the option GRCh38.p12_genomic.fna.gz ?

2) Moreover, the alternative option which is "relatively equivalent" from UCSC, is in the following link:

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

However, this contains the original hg38 assembly of 2013 ? and not the latest release like NCBI from above ? or it also includes the relative updates ?

Thank you in advance,

Efstathios-Iason

ADD COMMENTlink modified 8 weeks ago by Biostar ♦♦ 20 • written 3 months ago by svlachavas480
2

See this blog post from Heng Li: Which human reference genome to use?

ADD REPLYlink written 3 months ago by WouterDeCoster32k

Thank you very much for your link

ADD REPLYlink written 3 months ago by svlachavas480
1

Take a look at GENCODE which is the official source of human genome data.

ADD REPLYlink written 3 months ago by genomax56k

Dear genomax,

thank you for your alternative proposal-so, you would suggest for my purpose, the GENCODE reference assembly ? or there are some strengths on each source, that i would have to take into account ?

ADD REPLYlink written 3 months ago by svlachavas480
1

GRCh38 reference assembly is identical every where and original release did occur in December 2013. Since then patch releases have occurred (but they don't affect chromosomal coordinates). Depending on where you get your annotations they may be slightly different. Is this targeted or whole exome sequencing?

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax56k

Dear genomax,

thank you for your information and comments- actually whole exome sequencing has been performed (Genomic DNA captured using Agilent in-solution enrichment methodology/paired-end 75 bases massively parallel sequencing on Illumina HiSeq4000) and i already have the fastq files. So, my next step is the alignment of the files, and then variant calling as mentioned.

ADD REPLYlink modified 3 months ago • written 3 months ago by svlachavas480
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1496 users visited in the last hour