Question: hg19 exome reference and index
0
gravatar for genomics Newbie
14 months ago by
genomics Newbie20 wrote:

What is best source for obtaining the hg19 exome reference and its index? This reference will be used with BWA. Do I need to build the index if I’m using a specific version of BWA ? Thank you!

index reference hg19 exome • 1.2k views
ADD COMMENTlink modified 14 months ago by Kevin Blighe41k • written 14 months ago by genomics Newbie20
3
gravatar for Kevin Blighe
14 months ago by
Kevin Blighe41k
Kevin Blighe41k wrote:

You should always align DNA-seq data to the entire genome. For hg19, download the hg19.2bit file from here: http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/bigZips/

Then, convert it to FASTA format with twobittofa: http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/bigZips/

Be aware, also, that GRCh38 / hg38 is the latest release of the human genome reference. hg19 has 'issues': A: Alternate nucleotide is more frequent than reference nucleotide. OMG I'm dizzy. (as does hg38...)

ADD COMMENTlink modified 7 weeks ago • written 14 months ago by Kevin Blighe41k

Request for clarification.

I was able to successfully run ./twoBitToFa hg19.2bit hg19.fa and ensured hg19.fa was generated. I need both an index as well as the actual reference sequence. What is the best way to proceed forward so that both the reference sequence and associated index are generated?

Thank you.

ADD REPLYlink written 14 months ago by genomics Newbie20
1

Hello. To index the FASTA genome reference with bwa, you should use the bwa index command, for example:

bwa index hg19.fa

It will produce a few different files, each of which you will not have to directly reference again provided they are kept in the same directory as your FASTA reference file.

Then, I would use bwa mem for the alignment if your reads are >70bp in length. For shorter reads, you should be using one of the previous bwa algorithms (like we used to do...) or using something like bowtie, which are more tailoured for shorter reads. For example:

bwa mem ReferenceGenomes/hg19/hg19.fasta M1.fastq M2.fastq > Aligned.sam

Prior to alignment, you may consider performing some QC of your reads and 'trimming' in order to eliminate junk that would not have otherwise aligned or that could result in false variant calls further down the line due to low quality bases. For a full idea of pipeline involving trimming, alignment (bwa), generation of QC metrics, and then variant calling (mostly using tools coming from the Wellcome Trust Sanger Inst. in the UK and not Broad Inst), take a look at my GitHub pipeline: https://github.com/kevinblighe/ClinicalGradeDNAseq (in particular, you may look at AnalysisMasterVersion1.sh for the code).

Kevin

ADD REPLYlink modified 14 months ago • written 14 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1277 users visited in the last hour