Question: BWA mapping to multiple reference genome
0
gravatar for gerrybio2010
24 days ago by
gerrybio20100 wrote:

If the reference genome is very big (like for plant species), we'd like to first split the ref into smaller chunks, for example, chromosome by chromosome. Then we could map fastq to each chromosome ref separately, then merge together.

Then my worry is, this would totally change the alignment scenario compared to running against one complete genome. One read could potentially map MANY TIMES. For example, one unique read coming from chr1, will definitely map to chr1 with highest mapping score, when complete genome used as reference. But when we try to map against each chromosome reference, this same read could map to many different similar ref sequences, which bring many false positives.

So after we map to separate chromosome reference, then merge together, do we have any tools to re-calculate the mapping score? Maybe dedup tools?

But to me, dedup usually means, we find the mapping with same sequence + start + end + orientation, and remove potential PCR duplicates. So is it possible to have another type of "dedup", that is to only to retain the best mapping for one read, removing other lower-score mapping?

thx

ADD COMMENTlink modified 24 days ago by swbarnes25.8k • written 24 days ago by gerrybio20100
2

Mapping against a reduced reference is always going to cause problems. bwa should be able to handle large genomes without having to split them.

ADD REPLYlink written 24 days ago by genomax68k
1
gravatar for swbarnes2
24 days ago by
swbarnes25.8k
United States
swbarnes25.8k wrote:

we'd like to first split the ref into smaller chunks, for example, chromosome by chromosome.

Sorry, but this is a terrible idea. You need to let every read find its best mapping position from the entire genome, which means each read needs to be aligned to the entire genome.

You can split your fastq into chunks to do the alignments in parallel, but do not split the reference into chunks.

ADD COMMENTlink written 24 days ago by swbarnes25.8k

Just to make sure you won't go that path, I've done that years ago when bowtie2 wasn't able to handle huge references.

You will face the actual problem you try to avoid, probably raised to a higher power. Reads with suboptimal alignments which are not reported on the full reference but might be on split references. You need to sort that kind of stuff out.

When I did it was an ugly mess, loads of pointless work, and I was so glad bowtie2 developers published an update that handled the problem. Check the manual, bwa has options to restrict reporting

ADD REPLYlink modified 23 days ago • written 23 days ago by Carambakaracho1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour