Question: Download latest reference genome assembly for exome sequencing alignment and variant calling
0
gravatar for svlachavas
10 months ago by
svlachavas560
Greece
svlachavas560 wrote:

Dear Community,

i would like to search and download the latest possible human reference genome assembly hg38/GRCh38, in order to use it both in the process of sequence alignment of raw reads, as also for variant calling concerning exome sequencing. However, I'm a bit confused about the available options and the different sources, such as UCSC and NCBI. In detail:

1) If i want to download the latest reference genome human assembly available, then this would be the option : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12 ?

and specifically the option GRCh38.p12_genomic.fna.gz ?

2) Moreover, the alternative option which is "relatively equivalent" from UCSC, is in the following link:

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

However, this contains the original hg38 assembly of 2013 ? and not the latest release like NCBI from above ? or it also includes the relative updates ?

Thank you in advance,

Efstathios-Iason

ADD COMMENTlink modified 9 months ago by Biostar ♦♦ 20 • written 10 months ago by svlachavas560
2

See this blog post from Heng Li: Which human reference genome to use?

ADD REPLYlink written 10 months ago by WouterDeCoster38k

Thank you very much for your link

ADD REPLYlink written 10 months ago by svlachavas560
1

Take a look at GENCODE which is the official source of human genome data.

ADD REPLYlink written 10 months ago by genomax65k

Dear genomax,

thank you for your alternative proposal-so, you would suggest for my purpose, the GENCODE reference assembly ? or there are some strengths on each source, that i would have to take into account ?

ADD REPLYlink written 10 months ago by svlachavas560
1

GRCh38 reference assembly is identical every where and original release did occur in December 2013. Since then patch releases have occurred (but they don't affect chromosomal coordinates). Depending on where you get your annotations they may be slightly different. Is this targeted or whole exome sequencing?

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax65k

Dear genomax,

thank you for your information and comments- actually whole exome sequencing has been performed (Genomic DNA captured using Agilent in-solution enrichment methodology/paired-end 75 bases massively parallel sequencing on Illumina HiSeq4000) and i already have the fastq files. So, my next step is the alignment of the files, and then variant calling as mentioned.

ADD REPLYlink modified 10 months ago • written 10 months ago by svlachavas560
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1799 users visited in the last hour