Picard error Illegal Mate State in converting BAM to Fastq
Entering edit mode
7.8 years ago
trakhtenberg ▴ 160

There was a discussion over year and half ago here Picard Exception: Illegal Mate State regarding Picard Exception: Illegal Mate State, and there was a link to here https://github.com/jeff-k/resolvepairs explaining that the error is due to more than one pair of reads having the same query name. The proposed solution was to add unique id to the names of reads belonging to the same mate pair using the Resolvepair script, which failed for me on line 94. So, I have tried to first filter with samtools view -bq 4 accepted_hits.bam > filtered.bam, and then also add VALIDATION_STRINGENCY=LENIENT, but the error persisted. My command was:
/java -jar /SamToFastq.jar VALIDATION_STRINGENCY=LENIENT INPUT=filtered.bam FASTQ=read_1.fastq SECOND_END_FASTQ=read_2.fastq'
Is the Resolvepair the only solution? Did anyone encounter failure due to syntax error on line 94? Thank you

Picard SamToFastq BamToFastq • 4.0k views
Entering edit mode
7.8 years ago
Dan D 7.3k

I generally like the Picard suite. One of the exceptions is the exception-prone bamtofastq tool. I've had several different sets of problems when trying to convert BAM to paired-end FASTQ.

Instead, I recommend using bedtools bamtofastq. It's far more robust in my opinion, and I've had a lot more success with it when dealing with paired-end datasets.

Entering edit mode

Yes, it worked, thank you. It found 178.1k pairs and 33k singletons. I also tried bam2fastx-based script I found here but it found 211.3k pairs and 29k singletons. [However, using bam2fastx (from tophat) on its own exits fast with Error couldn't retrieve both reads for pair HWI..., although file was sorted by name]. Would appreciate clarifications on the inconsistencies between the results from these different approaches. Samtools flagstat output on this Bam file is:

447242 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
447242 + 0 mapped (100.00%:-nan%)
447242 + 0 paired in sequencing
232755 + 0 read1
214487 + 0 read2
334230 + 0 properly paired (74.73%:-nan%)
413812 + 0 with itself and mate mapped
33430 + 0 singletons (7.47%:-nan%)
1354 + 0 with mate mapped to a different chr
1354 + 0 with mate mapped to a different chr (mapQ>=5)

My Bam file is from Tophat, and I would like to re-analyze these reads after filtering again with Tophat. Is it important to integrate them back with paired reads for re-analysis? It appears from here that singletons are not passed on to Cufflinks by Tophat for FPKM, but since they mapped, I would think that the Tophat/Cufflinks pipeline would make use of them? Are singletons tend to be splice-junction reads?

Entering edit mode

Hi! @Dan D. I encountered the same problem as @trakhtenberg did. I tried your method using bedtools. There were a lot of warning indicating that a read has a missing pair when the program was running, eg. "*WARNING: Query HWI-ST1061:191:C0LK6ACXX:5:1315:18250:10889 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping. "

If each warning refers to a missing read, the total number (330000), in my case, was more than that of the missing reads using picardtools/SamToFastq (271000). Did you saw the same warning when using bedtools?


Login before adding your answer.

Traffic: 1562 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6