Question: Picard error Illegal Mate State in converting BAM to Fastq
1
gravatar for trakhtenberg
4.6 years ago by
trakhtenberg150
United States
trakhtenberg150 wrote:

There was a discussion over year and half ago here Picard Exception: Illegal Mate State regarding Picard Exception: Illegal Mate State, and there was a link to here https://github.com/jeff-k/resolvepairs explaining that the error is due to more than one pair of reads having the same query name. The proposed solution was to add unique id to the names of reads belonging to the same mate pair using the Resolvepair script, which failed for me on line 94. So, I have tried to first filter with samtools view -bq 4 accepted_hits.bam > filtered.bam, and then also add VALIDATION_STRINGENCY=LENIENT, but the error persisted. My command was:
/java -jar /SamToFastq.jar VALIDATION_STRINGENCY=LENIENT INPUT=filtered.bam FASTQ=read_1.fastq SECOND_END_FASTQ=read_2.fastq'
Is the Resolvepair the only solution? Did anyone encounter failure due to syntax error on line 94? Thank you

samtofastq picard bamtofastq • 2.6k views
ADD COMMENTlink modified 4.6 years ago by Dan D6.7k • written 4.6 years ago by trakhtenberg150
1
gravatar for Dan D
4.6 years ago by
Dan D6.7k
Tennessee
Dan D6.7k wrote:

I generally like the Picard suite. One of the exceptions is the exception-prone bamtofastq tool. I've had several different sets of problems when trying to convert BAM to paired-end FASTQ.

Instead, I recommend using bedtools bamtofastq. It's far more robust in my opinion, and I've had a lot more success with it when dealing with paired-end datasets.

ADD COMMENTlink written 4.6 years ago by Dan D6.7k

Yes, it worked, thank you. It found 178.1k pairs and 33k singletons. I also tried bam2fastx-based script I found here https://github.com/blaxterlab/scripts/blob/master/tools/bam2fastx.pl but it found 211.3k pairs and 29k singletons. [However, using bam2fastx (from tophat) on its own exits fast with Error couldn't retrieve both reads for pair HWI..., although file was sorted by name]. Would appreciate clarifications on the inconsistencies between the results from these different approaches. Samtools flagstat output on this Bam file is:

447242 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
447242 + 0 mapped (100.00%:-nan%)
447242 + 0 paired in sequencing
232755 + 0 read1
214487 + 0 read2
334230 + 0 properly paired (74.73%:-nan%)
413812 + 0 with itself and mate mapped
33430 + 0 singletons (7.47%:-nan%)
1354 + 0 with mate mapped to a different chr
1354 + 0 with mate mapped to a different chr (mapQ>=5)

My Bam file is from Tophat, and I would like to re-analyze these reads after filtering again with Tophat. Is it important to integrate them back with paired reads for re-analysis? It appears from here http://www.arrayserver.com/wiki/index.php?title=FPKM_Transcript that singletons are not passed on to Cufflinks by Tophat for FPKM, but since they mapped, I would think that the Tophat/Cufflinks pipeline would make use of them? Are singletons tend to be splice-junction reads?

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by trakhtenberg150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1429 users visited in the last hour