Question: How to avoid building millions of BWA indices?
0
gravatar for hunterhybrid999
7 weeks ago by
hunterhybrid9990 wrote:

I have a large number of episomes (don't know what they all are and some might be present with human sequence) that I would like to align against along with the human genome. Is there a way to somehow merge BAM files so I can save time from building BWA indices each time for the genome + episome? How do people currently address the problem of having large eukaryotic genomes with multiple small additions/changes (genome + multiple different plasmids)?

ADD COMMENTlink written 7 weeks ago by hunterhybrid9990

. Is there a way to somehow merge BAM files so I can save time from building BWA indices each time for the genome + episome

it's not clear to me if your asking about the fasta reference(s) or about the bam file(s).

ADD REPLYlink written 7 weeks ago by Pierre Lindenbaum134k

I have the fasta reference for multiple episomes/plasmids that would be added for each experiment and the human genome. I would like to know if there's a way to align vs just the episomes/plasmids and then against the genome and merge the BAM files together. This would allow me to save time having to index the episomes/plasmids + genome since I would then be able to just index the new episomes/plasmids and not re-index the genome.

ADD REPLYlink written 7 weeks ago by hunterhybrid9990
1

it's not clear to me what you mean with "and then against the genome and merge the BAM files together" what would happen if a read maps an episome AND the human reference ?

anyway, you could first map to the human reference with bwa, extract the unmapped reads with samtools, map the unmapped reads vs episome1, extract the unmapped reads with samtools, map the unmapped reads vs episome2, etc... etc.. but this strategy would lead to many false positives.

ADD REPLYlink written 7 weeks ago by Pierre Lindenbaum134k

large number of episomes

People on this forum are mainly informaticians so it may help to provide some clarification of how this experiment is done. Perhaps you could build one index with human genome + all episomes (plasmids?)? Not sure if that is feasible or if that would cause problems with multi-mapping.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by GenoMax96k

The episomes/plasmids are constantly being added into the analysis. I would like to know if there's a way to merge BAM files (align vs genome and align vs episome/plasmids separately and somehow merge the BAMs together) so that I don't have to index each time a new episome/plasmid is added.

ADD REPLYlink written 7 weeks ago by hunterhybrid9990

if this is about merging bam files, then samtools has the subcommand merge:

http://www.htslib.org/doc/samtools-merge.html

ADD REPLYlink written 7 weeks ago by Mark900

If the episome references are non-overlapping then I don't think you can merge the BAM files since human + epsisome reference is not going to be identical between BAM files as episome is different for each alignment.

align vs genome and align vs episome/plasmids separately and somehow merge the BAMs together

Aligning to reduced reference is never a good idea with small reads when you know the data comes from entire genome.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by GenoMax96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour
_