How to remove reads with no mate? (ERROR:MATE_NOT_FOUND)
2
1
Entering edit mode
15 months ago
lacb ▴ 120

Hi,

I've got a problem with specific bam files, I've downloaded only a slice of them (one chromosome), this results in some MATE_NOT_FOUND error when checking my bams with:

gatk ValidateSamFile -I input.bam -M SUMMARY

Giving the error:

Error Type      Count
ERROR:MATE_NOT_FOUND    1408
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND     18551
ERROR:MISMATCH_FLAG_MATE_UNMAPPED       733

I have tried two solutions, both decreases the number of MATE_NOT_FOUND but there are still a few remaining, I don't understand how it is possible.

1. I've tried with samtools:

samtools view input.bam -f 0x1 -f 0x2 -b -o input.fixed.bam

It should only keep "read paired (0x1)" and "read mapped in proper pair (0x2)".

Now I have this output:

Error Type      Count
ERROR:MATE_NOT_FOUND    81
ERROR:MISMATCH_FLAG_MATE_UNMAPPED       524

2. I've tried with PrintReads (GATK):

gatk PrintReads -I input.bam -o input.fixed.bam --read-filter PairedReadFilter

It should only keep reads that are properly paired:

Now I have this output:

Error Type      Count
ERROR:MATE_NOT_FOUND    81
ERROR:MISMATCH_FLAG_MATE_UNMAPPED       524

Why they are not all removed?

bam samtools gatk • 1.2k views
ADD COMMENT
1
Entering edit mode
15 months ago
lacb ▴ 120

I've tried the solution proposed by GenoMax but it didn't work for me.

After many attempts with the bams I finally solved the problem with a more radical solution by reverting the bams to fastq

samtools sort -n -O BAM -o sorted.bam input.bam
samtools fastq -1 output_1.fastq.gz -2 output_2.fastq.gz -0 /dev/null -s /dev/null -n sorted.bam

Then realign with bwa mem.

I think it is not the optimal solution as data loss could occur (BQSR data?) but at least it works.

ADD COMMENT
0
Entering edit mode

I came across this same problem with a BAM file mapped to some unknown reference version and did something similar to @lacb. I followed this blog post from Heng Li using samtools 1.18 and bwa 0.7.17-r1188. This was also referenced in Realigning BAM files to new reference:

samtools collate -Oun128 in.bam | samtools fastq -OT RG,BC - \
  | bwa mem -pt8 -CH <(samtools view -H in.bam|grep ^@RG) ref.fa - \
  | samtools sort -@4 -m4g -o out.bam -
ADD REPLY
1
Entering edit mode
15 months ago
GenoMax 142k

Assuming you have right tags you could try: How to remove un-paired reads from BAM after filtering? or use the solution provided in the same thread as an answer.

ADD COMMENT

Login before adding your answer.

Traffic: 2322 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6