Dear Community,
i would like to search and download the latest possible human reference genome assembly hg38/GRCh38, in order to use it both in the process of sequence alignment of raw reads, as also for variant calling concerning exome sequencing. However, I'm a bit confused about the available options and the different sources, such as UCSC and NCBI. In detail:
1) If i want to download the latest reference genome human assembly available, then this would be the option : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12 ?
and specifically the option GRCh38.p12_genomic.fna.gz ?
2) Moreover, the alternative option which is "relatively equivalent" from UCSC, is in the following link:
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/
However, this contains the original hg38 assembly of 2013 ? and not the latest release like NCBI from above ? or it also includes the relative updates ?
Thank you in advance,
Efstathios-Iason
See this blog post from Heng Li: Which human reference genome to use?
Thank you very much for your link
Take a look at GENCODE which is the official source of human genome data.
Dear genomax,
thank you for your alternative proposal-so, you would suggest for my purpose, the GENCODE reference assembly ? or there are some strengths on each source, that i would have to take into account ?
GRCh38 reference assembly is identical every where and original release did occur in December 2013. Since then patch releases have occurred (but they don't affect chromosomal coordinates). Depending on where you get your annotations they may be slightly different. Is this targeted or whole exome sequencing?
Dear genomax,
thank you for your information and comments- actually whole exome sequencing has been performed (Genomic DNA captured using Agilent in-solution enrichment methodology/paired-end 75 bases massively parallel sequencing on Illumina HiSeq4000) and i already have the fastq files. So, my next step is the alignment of the files, and then variant calling as mentioned.