Question: Final Solution For "Mapq Should Be 0 For Unmapped Read."
6
gravatar for Biomonika (Noolean)
7.3 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

I repeatedly come accross "MAPQ should be 0 for unmapped read." issue. Yes, I can set stringency as lenient in Picard and thus avoid evaluating this as an error, but still - I use another program and it is here again.

Is anybody aware of some tool for fixing this problem for bam files generated using BWA?

I have found things like CleanSam.jar, FixMateInformation and others, but none helped. I do not want to supress error, I would like to fix it:)

Thanks a lot.

bwa • 13k views
ADD COMMENTlink modified 3.6 years ago by Veera 90 • written 7.3 years ago by Biomonika (Noolean)3.1k
6
gravatar for matted
7.3 years ago by
matted7.2k
Boston, United States
matted7.2k wrote:

I put a comment in the other answer here. The issue is that there are unmapped reads (that is, with the unmapped read flag set) that have non-zero mapping qualities.

You could fix it by removing all unmapped reads, but that's heavy-handed and no good if you want to use them later for other analyses. It will fix the error, though. The command to do that is:

samtools view -bF 4 raw.bam > filtered.bam

The most correct way to fix it would be a short script that goes through the SAM/BAM and sets the MAPQ to 0 if the unmapped flag is set. I'll leave that as an exercise...

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by matted7.2k

just to clarify this point, I see on the samtools docs that "-F INT Skip alignments with bits present in INT [0]", so if we use "-F 4" this will be removing the unmapped reads? also, I've always considered MAPQ=0 as mapping errors due to multiple possible genome locations. if so, would it be acceptable to use the following command to cover all possibilities?

samtools view -bF 4 -q 1 raw.bam > filtered.bam
ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by Jorge Amigo11k

Yes, that would do what you want (exclude multi-mappers and unmapped reads). Excluding unmapped reads is sort of a strange thing, in that most tools should do the right thing when you give them unmapped reads. But filtering them can help if you have strict tools like Picard that will complain otherwise.

To me, filtering multi-mapping reads is for another purpose (but equally valid and necessary in some situations). That is, specific downstream analyses, not just avoiding Picard complaints.

ADD REPLYlink written 7.3 years ago by matted7.2k
1

sure, that's the first step we do before starting the whole "best practices" downstream analysis suggested by the GATK team. in our alignment BAMs we weren't getting the unmapped reads, so that's why we never came across this "-F 4" option, but since our final goal is to detect variants we thing that this initial step to remove anything not useful (multi-mapping and unmapped reads fit that description in our opinion) does help to slightly reduce the computer resources needed (at least it reduces the size of the BAM file).

ADD REPLYlink written 7.3 years ago by Jorge Amigo11k
1
gravatar for Jorge Amigo
7.3 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

MAPQ=0 reads can be removed by simply filtering them. for instance, you could use samtools to do that:

samtools view -hbq 1 raw.bam > filtered.bam

option 'h' includes header, option 'b' outputs binary, and option 'q 1' filters mapping qualities below 1

ADD COMMENTlink written 7.3 years ago by Jorge Amigo11k
3

This is wrong. The issue is that some unmapped reads have mapq scores that AREN'T 0. See this thread for more information.

If you want to filter (possibly) offending reads, you could just filter unmapped reads with samtools view -bF 4 raw.bam > filtered.bam.

Also, -h doesn't matter when you're writing bam (-b); it will always put the header in the bam. It just matters for sam output.

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by matted7.2k

+1 for letting me know that -h is not needed when binary format is on. thanks a lot for the correction. I'll write my coment directly on your question then.

ADD REPLYlink written 7.3 years ago by Jorge Amigo11k
1
gravatar for Marvin
7.3 years ago by
Marvin850
Marvin850 wrote:

Just ignore the warning. The actual fix would be to repair Picard, because that warning is simply wrong. Nothing in the SAM spec says that unmapped reads should have a MAPQ of 0. If a read maps unambigously to a nonsensical position, it should have non-vanishing MAPQ and the 'unmapped' flag set.

(On the other hand, I totally understand why Picard complains, it's perfectly reasonable. It still doesn't match the spec... therefore the SAM spec, especially the "best practices" section, is unreasonable. But that's a different question...)

ADD COMMENTlink written 7.3 years ago by Marvin850
1
gravatar for toni
7.3 years ago by
toni2.1k
Lyon
toni2.1k wrote:

I chose to fix this myself. Just like you, I did not find any tool that fixes all the Picard complaints.

So what I did is to write a wrapper around the bwa sampe call and I intercept all the pairs coming out and fix flags and tags on the fly.

The only way you can know whether your read is unmapped or not is to check the corresponding flag bit. Once you caught the info, updates parts of the SAM alignment that need to be updated. And moreover, if you are dealing with paired-end experiments, as you are holding the alignment of the pair at the very same time, you can even update the mate information as well in the mate alignment & vice-versa.

Depending on the number of checks you want to do and your piece of code, it can increase (double sometimes in my case) the computation time, but at the end of the process, you have a clean file.

T.

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by toni2.1k
1
gravatar for guillaume.tiberi
4.3 years ago by
France
guillaume.tiberi10 wrote:

I was in the same case with a bamset files mapped with bwa. Lot of "MAPQ should be 0 for unmapped read." warnings come during GATK process. I solve this problem by using picard CleanSam tool. As it is write on the CleanSam manual:

$ picard-tools CleanSam
ERROR: Option 'INPUT' is required.

USAGE: CleanSam [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CleanSam

Read SAM and perform various fix-ups.  Currently, the only fix-ups are 1: to soft-clip an alignment that hangs off the end of its reference sequence; and 2: to set MAPQ to 0 if a read is unmapped.
Version: 1.105()

I notice you have tried this solution but it not resolve your problem. This is curious, maybe a different version.

ADD COMMENTlink modified 12 weeks ago by RamRS25k • written 4.3 years ago by guillaume.tiberi10

I had this problem 3 years ago so yes, I guess this functionality is now part of CleanSam as you say. Good to know.

ADD REPLYlink written 4.3 years ago by Biomonika (Noolean)3.1k
1
gravatar for Tabea
3.8 years ago by
Tabea30
Tabea30 wrote:

Since many people having this problem with "Picard" might land here from google, I am adding a solution that is working for me.

In Picard, you can now set the option VALIDATION_STRINGENCY=LENIENT, to tell Picard to be more forgiving about an invalid SAM file.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Tabea30

If you read carefully, this is already said in the first line of the question itself.

ADD REPLYlink written 3.8 years ago by toni2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 781 users visited in the last hour