Final Solution For "Mapq Should Be 0 For Unmapped Read."
6
8
Entering edit mode
11.9 years ago

I repeatedly come accross "MAPQ should be 0 for unmapped read." issue. Yes, I can set stringency as lenient in Picard and thus avoid evaluating this as an error, but still - I use another program and it is here again.

Is anybody aware of some tool for fixing this problem for bam files generated using BWA?

I have found things like CleanSam.jar, FixMateInformation and others, but none helped. I do not want to supress error, I would like to fix it:)

Thanks a lot.

bwa • 19k views
ADD COMMENT
7
Entering edit mode
11.9 years ago
matted 7.8k

I put a comment in the other answer here. The issue is that there are unmapped reads (that is, with the unmapped read flag set) that have non-zero mapping qualities.

You could fix it by removing all unmapped reads, but that's heavy-handed and no good if you want to use them later for other analyses. It will fix the error, though. The command to do that is:

samtools view -bF 4 raw.bam > filtered.bam

The most correct way to fix it would be a short script that goes through the SAM/BAM and sets the MAPQ to 0 if the unmapped flag is set. I'll leave that as an exercise...

ADD COMMENT
0
Entering edit mode

just to clarify this point, I see on the samtools docs that "-F INT Skip alignments with bits present in INT [0]", so if we use "-F 4" this will be removing the unmapped reads? also, I've always considered MAPQ=0 as mapping errors due to multiple possible genome locations. if so, would it be acceptable to use the following command to cover all possibilities?

samtools view -bF 4 -q 1 raw.bam > filtered.bam
ADD REPLY
0
Entering edit mode

Yes, that would do what you want (exclude multi-mappers and unmapped reads). Excluding unmapped reads is sort of a strange thing, in that most tools should do the right thing when you give them unmapped reads. But filtering them can help if you have strict tools like Picard that will complain otherwise.

To me, filtering multi-mapping reads is for another purpose (but equally valid and necessary in some situations). That is, specific downstream analyses, not just avoiding Picard complaints.

ADD REPLY
1
Entering edit mode

sure, that's the first step we do before starting the whole "best practices" downstream analysis suggested by the GATK team. in our alignment BAMs we weren't getting the unmapped reads, so that's why we never came across this "-F 4" option, but since our final goal is to detect variants we thing that this initial step to remove anything not useful (multi-mapping and unmapped reads fit that description in our opinion) does help to slightly reduce the computer resources needed (at least it reduces the size of the BAM file).

ADD REPLY
2
Entering edit mode
11.9 years ago
Marvin ▴ 900

Just ignore the warning. The actual fix would be to repair Picard, because that warning is simply wrong. Nothing in the SAM spec says that unmapped reads should have a MAPQ of 0. If a read maps unambigously to a nonsensical position, it should have non-vanishing MAPQ and the 'unmapped' flag set.

(On the other hand, I totally understand why Picard complains, it's perfectly reasonable. It still doesn't match the spec... therefore the SAM spec, especially the "best practices" section, is unreasonable. But that's a different question...)

ADD COMMENT
1
Entering edit mode
11.9 years ago

MAPQ=0 reads can be removed by simply filtering them. for instance, you could use samtools to do that:

samtools view -hbq 1 raw.bam > filtered.bam

option 'h' includes header, option 'b' outputs binary, and option 'q 1' filters mapping qualities below 1

ADD COMMENT
3
Entering edit mode

This is wrong. The issue is that some unmapped reads have mapq scores that AREN'T 0. See this thread for more information.

If you want to filter (possibly) offending reads, you could just filter unmapped reads with samtools view -bF 4 raw.bam > filtered.bam.

Also, -h doesn't matter when you're writing bam (-b); it will always put the header in the bam. It just matters for sam output.

ADD REPLY
0
Entering edit mode

+1 for letting me know that -h is not needed when binary format is on. thanks a lot for the correction. I'll write my coment directly on your question then.

ADD REPLY
1
Entering edit mode
11.9 years ago
toni ★ 2.2k

I chose to fix this myself. Just like you, I did not find any tool that fixes all the Picard complaints.

So what I did is to write a wrapper around the bwa sampe call and I intercept all the pairs coming out and fix flags and tags on the fly.

The only way you can know whether your read is unmapped or not is to check the corresponding flag bit. Once you caught the info, updates parts of the SAM alignment that need to be updated. And moreover, if you are dealing with paired-end experiments, as you are holding the alignment of the pair at the very same time, you can even update the mate information as well in the mate alignment & vice-versa.

Depending on the number of checks you want to do and your piece of code, it can increase (double sometimes in my case) the computation time, but at the end of the process, you have a clean file.

T.

ADD COMMENT
1
Entering edit mode
8.8 years ago

I was in the same case with a bamset files mapped with bwa. Lot of "MAPQ should be 0 for unmapped read." warnings come during GATK process. I solve this problem by using picard CleanSam tool. As it is write on the CleanSam manual:

$ picard-tools CleanSam
ERROR: Option 'INPUT' is required.

USAGE: CleanSam [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CleanSam

Read SAM and perform various fix-ups.  Currently, the only fix-ups are 1: to soft-clip an alignment that hangs off the end of its reference sequence; and 2: to set MAPQ to 0 if a read is unmapped.
Version: 1.105()

I notice you have tried this solution but it not resolve your problem. This is curious, maybe a different version.

ADD COMMENT
0
Entering edit mode

I had this problem 3 years ago so yes, I guess this functionality is now part of CleanSam as you say. Good to know.

ADD REPLY
1
Entering edit mode
8.4 years ago
Tabea ▴ 30

Since many people having this problem with "Picard" might land here from google, I am adding a solution that is working for me.

In Picard, you can now set the option VALIDATION_STRINGENCY=LENIENT, to tell Picard to be more forgiving about an invalid SAM file.

ADD COMMENT
0
Entering edit mode

If you read carefully, this is already said in the first line of the question itself.

ADD REPLY

Login before adding your answer.

Traffic: 787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6