How to avoid building millions of BWA indices?
0
0
Entering edit mode
3.3 years ago

I have a large number of episomes (don't know what they all are and some might be present with human sequence) that I would like to align against along with the human genome. Is there a way to somehow merge BAM files so I can save time from building BWA indices each time for the genome + episome? How do people currently address the problem of having large eukaryotic genomes with multiple small additions/changes (genome + multiple different plasmids)?

genome alignment next-gen Assembly • 779 views
ADD COMMENT
0
Entering edit mode

. Is there a way to somehow merge BAM files so I can save time from building BWA indices each time for the genome + episome

it's not clear to me if your asking about the fasta reference(s) or about the bam file(s).

ADD REPLY
0
Entering edit mode

I have the fasta reference for multiple episomes/plasmids that would be added for each experiment and the human genome. I would like to know if there's a way to align vs just the episomes/plasmids and then against the genome and merge the BAM files together. This would allow me to save time having to index the episomes/plasmids + genome since I would then be able to just index the new episomes/plasmids and not re-index the genome.

ADD REPLY
1
Entering edit mode

it's not clear to me what you mean with "and then against the genome and merge the BAM files together" what would happen if a read maps an episome AND the human reference ?

anyway, you could first map to the human reference with bwa, extract the unmapped reads with samtools, map the unmapped reads vs episome1, extract the unmapped reads with samtools, map the unmapped reads vs episome2, etc... etc.. but this strategy would lead to many false positives.

ADD REPLY
0
Entering edit mode

large number of episomes

People on this forum are mainly informaticians so it may help to provide some clarification of how this experiment is done. Perhaps you could build one index with human genome + all episomes (plasmids?)? Not sure if that is feasible or if that would cause problems with multi-mapping.

ADD REPLY
0
Entering edit mode

The episomes/plasmids are constantly being added into the analysis. I would like to know if there's a way to merge BAM files (align vs genome and align vs episome/plasmids separately and somehow merge the BAMs together) so that I don't have to index each time a new episome/plasmid is added.

ADD REPLY
0
Entering edit mode

if this is about merging bam files, then samtools has the subcommand merge:

http://www.htslib.org/doc/samtools-merge.html

ADD REPLY
0
Entering edit mode

If the episome references are non-overlapping then I don't think you can merge the BAM files since human + epsisome reference is not going to be identical between BAM files as episome is different for each alignment.

align vs genome and align vs episome/plasmids separately and somehow merge the BAMs together

Aligning to reduced reference is never a good idea with small reads when you know the data comes from entire genome.

ADD REPLY

Login before adding your answer.

Traffic: 2468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6