Should I Remove The Unmapped Reads From My Bam ?
6
11
Entering edit mode
10.6 years ago

is it a good practice to remove all the unmapped reads with:

samtools view -F 4

after they've been mapped with bwa ? The bam files would be smaller and the remaining operations would be faster isn't it ?

or shall I regret it later ?

samtools next-gen sequencing bam short • 11k views
ADD COMMENT
6
Entering edit mode

GATK realigner will use some unmapped reads when doing local realignment.

ADD REPLY
0
Entering edit mode

What is the final answer for this question? For variant calling analysis should we remove unmapped regions in the .bam file?

ADD REPLY
0
Entering edit mode

. For one, future algorithms may do a better job and allow you to recover some of that data. For another, future genome builds may resolve poorly assembled regions and allow additional reads to be mapped

For variant calling analysis should we remove unmapped regions in the .bam file?

no

ADD REPLY
0
Entering edit mode

why? Is there any use of unmapped bam file during variant calling analysis?

ADD REPLY
1
Entering edit mode

GATK realigner will use some unmapped reads when doing local realignment.

ADD REPLY
13
Entering edit mode
10.6 years ago

To future-proof your data, it seems reasonable to hold on to the unmapped reads. For one, future algorithms may do a better job and allow you to recover some of that data. For another, future genome builds may resolve poorly assembled regions and allow additional reads to be mapped.

Neither of these improvements is likely to enable huge discoveries, but the cost you're paying in storage is pretty minimal, compared to the costs of sample collection and sequencing. The speed hit probably isn't as bad as you think either, since the bam is indexed. Smart algorithms will make use of that information and not even have to consider those unmapped reads.

ADD COMMENT
0
Entering edit mode

Heng's point above is good too. Some indel/SV algorithms create new contigs of the altered sequence and do local realignment. If you toss your mapped reads, you're losing all that incredibly useful info.

ADD REPLY
4
Entering edit mode
10.6 years ago
Christof Winter ★ 1.0k

To save space, you could as well delete your FASTQ files instead, and keep the BAM file with the unaligned reads. Now that would make Peter happy, wouldn't it?

ADD COMMENT
0
Entering edit mode

I'd ask him about his backup strategy before deleting any raw data ;)

ADD REPLY
2
Entering edit mode
10.6 years ago

There is a tradeoff. If you want to call variants getting rid of the unaligned reads mades things go a bit faster. However, having all the reads in one place is very convient if you ever need to go back to a project.

I usually discard the unaligned reads since dedup will throw away a bunch of reads one step downstream.

ADD COMMENT
1
Entering edit mode

To horde or not to horde that is the question :)

ADD REPLY
1
Entering edit mode
10.6 years ago
toni ★ 2.2k

What you can do also is to build a wrapper around the bwa sampe step. When this step generates the SAM file, check the flag on the fly and split into several files like a 'bam' and an 'unmapped.bam'. In Perl, it goes roughly like this :

my $command = 'bwa sampe -P -s ref.fa sai1 sai2 fq1 fq2'
my $pid = open my $COM, '-|', $command
    or croak "Could not exec $command : $!";

# Splitting output stream between several files
while( my $read = <$COM> ) {
    chomp $read;
    next if($read =~ m!^\@!); # Skip header lines

    if($read1) {
        # Second read in a pair
        $read2 = $read;

            # Process your read1 and read2 and split between several files
            # if you want. For instance pairs for which there is at least
            # one read mapped on one side, and unmapped pairs on the other
            # side. (By checking the flag)

        ($read1, $read2) = (undef,undef);  # Move to next pair
    }
    else {
        # First read in a pair
        $read1 = $read;
    }
}

close $COM or croak 'Failed to close command : ' . $command;

This way you keep all the reads of your sample (in several files) but you can process only the interesting reads if you want to.

ADD COMMENT
1
Entering edit mode
8.5 years ago
johnblue81 ▴ 50

You can also use the remapper of segemehl and try to map the unmapped reads.

I found a manual and a presentation about the remapper in the internet:

ADD COMMENT
0
Entering edit mode
9.3 years ago
Toni ▴ 10

In order to remove all the unmapped reads, shouldn't we use the above command? :

samtools view -F12 samfile

ADD COMMENT
0
Entering edit mode

some tools like picard MardDuplicates removes the dup reads at the same time.

ADD REPLY

Login before adding your answer.

Traffic: 1027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6