Making BAM or SAM files equivalent
2
0
Entering edit mode
3.5 years ago

Hello everyone!

So I was wondering if anyone had any advice or tools to help make different BAM or SAM files equivalent. What I mean by that is I have aligned the sequence data of different populations of a species to the same reference genome and due to some reason or another the number of alignments in each BAM file varies widely, for example, 360,000 vs 1,400,000. I'm hoping to do further downstream analysis in popoolation2 and in order to accurately compare these populations I think it will help to only include regions where all the populations aligned to that spot in the reference genome.

So basically I would need something that would compare two bam files and output two new bam files with alignments that were at the same regions on the reference genome.

I have the alignments in SAM, BAM, and sorted BAM formats so that is my starting point! Hopefully, I explained that well enough and thanks for any and all suggestions!

-KET

alignment BAM SAM popoolation popoolation2 • 1.1k views
2
Entering edit mode
3.5 years ago

Hello,

another approach would be, to find "callable" regions in each of your bam file and intersect those region.

This can be done using mosdepth

export MOSDEPTH_Q0=NOT_CALLABLE
export MOSDEPTH_Q1=CALLABLE
mosdepth -n --quantize 0:20: sample1.quantized sample1.bam


This will give you a compressed bed. Regions with a coverage less then 20 are labeled as NOT_CALLABLE and those with greater or equal 20 as CALLABLE.

You can then extract only regions that are callable in this sample:

$zgrep -w "CALLABLE" sample1.quantized.bed.gz|bgzip -c > sample1.callable.bed.gz  Repeat this for each sample. In the last step we intersect the callable region files to get a list of region which are callable in all of the files: $ bedtools intersect -a sample1.callable.bed.gz -b sample2.callable.bed.gz sample3.callable.bed.gz ... > callable.bed


You can then use callable.bed in your downstream analysis to restrict the analysis to these regions.

fin swimmer

0
Entering edit mode

Thank you fo the very thoroughly explained answer! I will definitely give this a shot!

1
Entering edit mode
3.5 years ago
Vitis ★ 2.5k

Maybe bedtools multicov is something to try. Do sliding windows coverage comparisons across multiple BAMs, then you'll identify regions with comparable coverage across all samples.

0
Entering edit mode

Thank you for the suggestion!