Question: Mapping To Different Genome Build
gravatar for Mpiro
9.2 years ago by
United States
Mpiro40 wrote:

I have an illumina NGS dataset and would like to use bwa to align it to the human reference genome for variant calling. I know that the dataset is generated using sureselect hg18 kit.

My question: Is it possible to align this dataset to hg19 or it can only be aligned to hg18?

Greatly appreciate your comments.

alignment hg • 2.0k views
ADD COMMENTlink modified 4.8 years ago by Biostar ♦♦ 20 • written 9.2 years ago by Mpiro40
gravatar for Istvan Albert
9.2 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

If you choose to align to a different build than what the data was originally designed for you will run into the problem of connecting the annotation information that came with the probe set to your mapped locations. Sooner or later you will need to make heavy use of that information. As it was mentioned before you can get around this by lifting over the probe information to the new build.

In my opinion you should either keep your probe information for hg18 and map against hg18 or lift over your probe annotations to hg19 and then map against hg19. Keep it simple since there will always plenty of complexity to deal with anyhow.

ADD COMMENTlink written 9.2 years ago by Istvan Albert ♦♦ 84k
gravatar for Pierre Lindenbaum
9.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

Yes, you can either

  • use the UCSC liftOver (available as a standalone tool too)
  • or realign your reads de-novo using a short read aligner (BWA, MAQ...)
ADD COMMENTlink written 9.2 years ago by Pierre Lindenbaum128k

The other thing to recognize is that you'll have more "off-target" sequences since the hg18 capture regions may not always correspond to transcripts (or other targets) in hg19 as our understanding of the genome has improved. This is useful to consider in your downstream analyses assessing the effectiveness of the experiment.

ADD REPLYlink written 9.2 years ago by Brad Chapman9.5k
gravatar for Stefano Berri
9.2 years ago by
Stefano Berri4.1k
Cambridge, UK
Stefano Berri4.1k wrote:

As far as I understand, sureselect hg18, simply designed the probe (for capturing) using version 18. You then capture a real genome (which has no version) and then you can align the reads you obtained to any reference genome (hg19) you like. The only problem is that you might find that some regions (on ref genome hg19) are missing because the probes (designed using hg18) didn't include that region.

But given that hg19 is better than hg18 and that you cannot change the capturing, I would align against hg19. Just be aware if something strange happens.

Hope this help

ADD COMMENTlink written 9.2 years ago by Stefano Berri4.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1816 users visited in the last hour