Question: hg19 exome reference and index
gravatar for genomics Newbie
2.1 years ago by
genomics Newbie20 wrote:

What is best source for obtaining the hg19 exome reference and its index? This reference will be used with BWA. Do I need to build the index if I’m using a specific version of BWA ? Thank you!

index reference hg19 exome • 2.1k views
ADD COMMENTlink modified 2.1 years ago by Kevin Blighe55k • written 2.1 years ago by genomics Newbie20
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe55k
Kevin Blighe55k wrote:

You should always align DNA-seq data to the entire genome. For hg19, download the hg19.2bit file from here:

Then, convert it to FASTA format with twobittofa:

Be aware, also, that GRCh38 / hg38 is the latest release of the human genome reference. hg19 has 'issues': A: Alternate nucleotide is more frequent than reference nucleotide. OMG I'm dizzy. (as does hg38...)

ADD COMMENTlink modified 12 months ago • written 2.1 years ago by Kevin Blighe55k

Request for clarification.

I was able to successfully run ./twoBitToFa hg19.2bit hg19.fa and ensured hg19.fa was generated. I need both an index as well as the actual reference sequence. What is the best way to proceed forward so that both the reference sequence and associated index are generated?

Thank you.

ADD REPLYlink written 2.1 years ago by genomics Newbie20

Hello. To index the FASTA genome reference with bwa, you should use the bwa index command, for example:

bwa index hg19.fa

It will produce a few different files, each of which you will not have to directly reference again provided they are kept in the same directory as your FASTA reference file.

Then, I would use bwa mem for the alignment if your reads are >70bp in length. For shorter reads, you should be using one of the previous bwa algorithms (like we used to do...) or using something like bowtie, which are more tailoured for shorter reads. For example:

bwa mem ReferenceGenomes/hg19/hg19.fasta M1.fastq M2.fastq > Aligned.sam

Prior to alignment, you may consider performing some QC of your reads and 'trimming' in order to eliminate junk that would not have otherwise aligned or that could result in false variant calls further down the line due to low quality bases. For a full idea of pipeline involving trimming, alignment (bwa), generation of QC metrics, and then variant calling (mostly using tools coming from the Wellcome Trust Sanger Inst. in the UK and not Broad Inst), take a look at my GitHub pipeline: (in particular, you may look at for the code).


ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Kevin Blighe55k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1169 users visited in the last hour