Should I Remove The Unmapped Reads From My Bam ?
6
12
Entering edit mode
10.9 years ago

is it a good practice to remove all the unmapped reads with:

samtools view -F 4


after they've been mapped with bwa ? The bam files would be smaller and the remaining operations would be faster isn't it ?

or shall I regret it later ?

samtools next-gen sequencing bam short • 11k views
6
Entering edit mode

GATK realigner will use some unmapped reads when doing local realignment.

0
Entering edit mode

What is the final answer for this question? For variant calling analysis should we remove unmapped regions in the .bam file?

0
Entering edit mode

. For one, future algorithms may do a better job and allow you to recover some of that data. For another, future genome builds may resolve poorly assembled regions and allow additional reads to be mapped

For variant calling analysis should we remove unmapped regions in the .bam file?

no

0
Entering edit mode

why? Is there any use of unmapped bam file during variant calling analysis?

1
Entering edit mode

GATK realigner will use some unmapped reads when doing local realignment.

15
Entering edit mode
10.9 years ago

To future-proof your data, it seems reasonable to hold on to the unmapped reads. For one, future algorithms may do a better job and allow you to recover some of that data. For another, future genome builds may resolve poorly assembled regions and allow additional reads to be mapped.

Neither of these improvements is likely to enable huge discoveries, but the cost you're paying in storage is pretty minimal, compared to the costs of sample collection and sequencing. The speed hit probably isn't as bad as you think either, since the bam is indexed. Smart algorithms will make use of that information and not even have to consider those unmapped reads.

0
Entering edit mode

Heng's point above is good too. Some indel/SV algorithms create new contigs of the altered sequence and do local realignment. If you toss your mapped reads, you're losing all that incredibly useful info.

4
Entering edit mode
10.9 years ago
Christof Winter ★ 1.0k

To save space, you could as well delete your FASTQ files instead, and keep the BAM file with the unaligned reads. Now that would make Peter happy, wouldn't it?

0
Entering edit mode

I'd ask him about his backup strategy before deleting any raw data ;)

2
Entering edit mode
10.9 years ago

There is a tradeoff. If you want to call variants getting rid of the unaligned reads mades things go a bit faster. However, having all the reads in one place is very convient if you ever need to go back to a project.

I usually discard the unaligned reads since dedup will throw away a bunch of reads one step downstream.

1
Entering edit mode

To horde or not to horde that is the question :)

1
Entering edit mode
10.9 years ago
toni ★ 2.2k

What you can do also is to build a wrapper around the bwa sampe step. When this step generates the SAM file, check the flag on the fly and split into several files like a 'bam' and an 'unmapped.bam'. In Perl, it goes roughly like this :

my $command = 'bwa sampe -P -s ref.fa sai1 sai2 fq1 fq2' my$pid = open my $COM, '-|',$command
or croak "Could not exec $command :$!";

# Splitting output stream between several files
while( my $read = <$COM> ) {
chomp $read; next if($read =~ m!^\@!); # Skip header lines

if($read1) { # Second read in a pair$read2 = $read; # Process your read1 and read2 and split between several files # if you want. For instance pairs for which there is at least # one read mapped on one side, and unmapped pairs on the other # side. (By checking the flag) ($read1, $read2) = (undef,undef); # Move to next pair } else { # First read in a pair$read1 = $read; } } close$COM or croak 'Failed to close command : ' . \$command;


This way you keep all the reads of your sample (in several files) but you can process only the interesting reads if you want to.

1
Entering edit mode
8.8 years ago
johnblue81 ▴ 50

You can also use the remapper of segemehl and try to map the unmapped reads.

I found a manual and a presentation about the remapper in the internet:

0
Entering edit mode
9.6 years ago
Toni ▴ 10

In order to remove all the unmapped reads, shouldn't we use the above command? :

samtools view -F12 samfile

0
Entering edit mode

some tools like picard MardDuplicates removes the dup reads at the same time.