Question

Converting Tophats Bam Output Back To Separate Paired End Read Fastq Files

0

Entering edit mode

11.4 years ago

bob-lowlow ▴ 40

Hi all,

I was wondering if anyone could offer me some advice on using paired end reads with Tophat, specifically with the output. I'm planning on using Tophat as part of a pipeline for processing my sequence data. The reads that map are obviously going to be easy to deal with, but the unmapped.bam file is proving a bit problematic. I would like to get that bam file back to two fastq files containing the paired reads which didn't map to the reference genome (hg19 in this case). What I was thinking was to convert to sam, and then use Picard's SamToFastq function, but that is returning the following error

MAPQ must be zero if RNAME is not specified;

Which I haven't been able to find anything about online. I'm also not sure how time consuming this will be. I'm currently just playing around with a random sample of my data just trying to get everything working, but my actual data files are probably going to be 20gb + at least in fastq format anyway.

I was also thinking of converting the accepted_hits.bam file to sam and then writing a unix script which would take the files which were input into tophat and write any read which isn't present in the accepted_hits file into 2 new files.

What do you think?

tophat sam RNA-seq bam • 2.7k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 11.4 years ago by bob-lowlow ▴ 40

score 2 · Answer 1 · 2012-12-04

2

Entering edit mode

11.4 years ago

Sean Davis 26k

Try setting VALIDATION_STRINGENCY=SILENT for SamToFastq.

ADD COMMENT • link 11.4 years ago by Sean Davis 26k

0

Entering edit mode

Thank you very much sir.

ADD REPLY • link 11.4 years ago by bob-lowlow ▴ 40

0

Entering edit mode

Or LENIENT. LENIENT it will still work but you'll get the warnings.

ADD REPLY • link 11.4 years ago by DG 7.3k