Using tophat2 output for kallisto
Entering edit mode
8 weeks ago
Wintermute • 0

I have some RNASeq data that my PI gave me from an old collaborator. Unfortunately, it looks like they only gave me the output of tophat2, not the raw fastq files, so I have sample1_accepted_hits.bam and sample1_unmapped.bam files.

I think it is possible to merge the two bam files, sort them by name, and then use samtools to extract the reads, but I have a couple of questions: 1) Will this actually give me all of the original reads from the fastq?

2) Is something like this the best way to handle this?

samtools merge -o sample1_merged.bam sample1_accepted_hits.bam sample1_unmapped.bam -@ 7

samtools collate -n 7 -u -O -@ 7 sample1_merged.bam | samtools fastq -F 0x900 -@ 7 -1 1.fastq.gz -2 2.fastq.gz -s sample.fastq.gz

3) This seems very slow. Is there a faster way to get the fastq files? I am doing this on a workstation with 32 GB RAM and an 8 core AMD 5700X so there are compute limitations.

bam tophat2 kallisto • 247 views
Entering edit mode
8 weeks ago
dsull ★ 5.8k

You should indeed use samtools to extract the original FASTQ files. But of course it will be slow -- it won't take up much memory (more like it takes up disk space), so just wait it out. I don't see why "compute limitations" would have anything to do with you simply waiting a bit longer.

There may be faster tools (can't think of any off the top of my head), but I'd just say wait it out.


Login before adding your answer.

Traffic: 1769 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6