Question: BWA mapping to multiple reference genome
gravatar for gerrybio2010
18 months ago by
gerrybio20100 wrote:

If the reference genome is very big (like for plant species), we'd like to first split the ref into smaller chunks, for example, chromosome by chromosome. Then we could map fastq to each chromosome ref separately, then merge together.

Then my worry is, this would totally change the alignment scenario compared to running against one complete genome. One read could potentially map MANY TIMES. For example, one unique read coming from chr1, will definitely map to chr1 with highest mapping score, when complete genome used as reference. But when we try to map against each chromosome reference, this same read could map to many different similar ref sequences, which bring many false positives.

So after we map to separate chromosome reference, then merge together, do we have any tools to re-calculate the mapping score? Maybe dedup tools?

But to me, dedup usually means, we find the mapping with same sequence + start + end + orientation, and remove potential PCR duplicates. So is it possible to have another type of "dedup", that is to only to retain the best mapping for one read, removing other lower-score mapping?


ADD COMMENTlink modified 18 months ago by swbarnes29.2k • written 18 months ago by gerrybio20100

Mapping against a reduced reference is always going to cause problems. bwa should be able to handle large genomes without having to split them.

ADD REPLYlink written 18 months ago by genomax92k
gravatar for swbarnes2
18 months ago by
United States
swbarnes29.2k wrote:

we'd like to first split the ref into smaller chunks, for example, chromosome by chromosome.

Sorry, but this is a terrible idea. You need to let every read find its best mapping position from the entire genome, which means each read needs to be aligned to the entire genome.

You can split your fastq into chunks to do the alignments in parallel, but do not split the reference into chunks.

ADD COMMENTlink written 18 months ago by swbarnes29.2k

Just to make sure you won't go that path, I've done that years ago when bowtie2 wasn't able to handle huge references.

You will face the actual problem you try to avoid, probably raised to a higher power. Reads with suboptimal alignments which are not reported on the full reference but might be on split references. You need to sort that kind of stuff out.

When I did it was an ugly mess, loads of pointless work, and I was so glad bowtie2 developers published an update that handled the problem. Check the manual, bwa has options to restrict reporting

ADD REPLYlink modified 18 months ago • written 18 months ago by Carambakaracho2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1324 users visited in the last hour