Entering edit mode
3.3 years ago
hunterhybrid999
•
0
I have a large number of episomes (don't know what they all are and some might be present with human sequence) that I would like to align against along with the human genome. Is there a way to somehow merge BAM files so I can save time from building BWA indices each time for the genome + episome? How do people currently address the problem of having large eukaryotic genomes with multiple small additions/changes (genome + multiple different plasmids)?
it's not clear to me if your asking about the fasta reference(s) or about the bam file(s).
I have the fasta reference for multiple episomes/plasmids that would be added for each experiment and the human genome. I would like to know if there's a way to align vs just the episomes/plasmids and then against the genome and merge the BAM files together. This would allow me to save time having to index the episomes/plasmids + genome since I would then be able to just index the new episomes/plasmids and not re-index the genome.
it's not clear to me what you mean with "and then against the genome and merge the BAM files together" what would happen if a read maps an episome AND the human reference ?
anyway, you could first map to the human reference with bwa, extract the unmapped reads with samtools, map the unmapped reads vs episome1, extract the unmapped reads with samtools, map the unmapped reads vs episome2, etc... etc.. but this strategy would lead to many false positives.
People on this forum are mainly informaticians so it may help to provide some clarification of how this experiment is done. Perhaps you could build one index with human genome + all episomes (plasmids?)? Not sure if that is feasible or if that would cause problems with multi-mapping.
The episomes/plasmids are constantly being added into the analysis. I would like to know if there's a way to merge BAM files (align vs genome and align vs episome/plasmids separately and somehow merge the BAMs together) so that I don't have to index each time a new episome/plasmid is added.
if this is about merging bam files, then samtools has the subcommand
merge
:http://www.htslib.org/doc/samtools-merge.html
If the episome references are non-overlapping then I don't think you can
merge
the BAM files sincehuman + epsisome
reference is not going to be identical between BAM files asepisome
is different for each alignment.Aligning to reduced reference is never a good idea with small reads when you know the data comes from entire genome.