I can not get ERCC92 reads with tophat2 on RNAseq data that was created from samples with spike-in.
I have downloaded the ERCC92 fa and gtf files from Thermo Scientific and created indexes with
Bowtie2-build ERCC92.fa ERCC92
I then went ahead and used tophat2 on the paired end files trimmed_fwd.fastq and trimmed_rev.fastq:
./tophat2 –G ERCC92.gtf –o OUTPUTNAME \--transcriptome-index=transcriptome_data/ERCC92 \ERCC92 trimmed_fwd.fastq trimmed_rev.fastq
Tophat2 runs smoothly but creates an error at the very end and fails to provide an accepted_hits.bam file. I am pasting the error message at the bottom of this post. I also tried running Tophat2 without the “-G” option but that failed as well.
I also tried to append the ERCC92.gtf and ERCC92.fa to the mm9_igenome.gtf and mm9_igenome.fa genome (mouse) or hg19_igenome.gtf and hg19_igenome.fa (human). For instructions see: Using igenome to run a seq analysis with Tophat/Cofflinks.....but how do I add the ERCC sequences to the the reference transcriptome and reference genome? Unfortunately, that didn’t work either. In this case, we get good data for all mouse/human genes, but the 92 ERCC genes are listed with no reads (FPKM = 0).
Does anybody know what I am missing? Thank you very much!
Here is the error message from tophat2:
[2017-02-09 09:16:41] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2017-02-09 09:16:41] Checking for Bowtie
Bowtie version: 2.2.5.0
[2017-02-09 09:16:41] Checking for Bowtie index files (transcriptome)..
[2017-02-09 09:16:41] Checking for Bowtie index files (genome)..
[2017-02-09 09:16:41] Checking for reference FASTA file
[2017-02-09 09:16:41] Generating SAM header for ERCC92
[2017-02-09 09:16:42] Reading known junctions from GTF file
Warning: TopHat did not find any junctions in GTF file
[2017-02-09 09:16:42] Preparing reads
WARNING: read pairing issues detected (check prep_reads.log) !
left reads: min. length=12, max. length=101, 72023097 kept reads (6565 discarded)
right reads: min. length=24, max. length=101, 71833581 kept reads (196081 discarded)
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2017-02-09 09:54:04] Using pre-built transcriptome data..
[2017-02-09 09:54:05] Mapping left_kept_reads to transcriptome ERCC92 with Bowtie2
[2017-02-09 10:31:36] Mapping right_kept_reads to transcriptome ERCC92 with Bowtie2
[2017-02-09 11:09:01] Resuming TopHat pipeline with unmapped reads
[2017-02-09 11:09:01] Mapping left_kept_reads.m2g_um to genome ERCC92 with Bowtie2
[2017-02-09 12:55:45] Mapping left_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4)
[2017-02-09 13:06:43] Mapping left_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4)
[2017-02-09 13:18:15] Mapping left_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4)
[2017-02-09 13:31:36] Mapping left_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4)
[2017-02-09 13:48:04] Mapping right_kept_reads.m2g_um to genome ERCC92 with Bowtie2
[2017-02-09 15:45:02] Mapping right_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4)
[2017-02-09 15:54:43] Mapping right_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4)
[2017-02-09 16:05:01] Mapping right_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4)
[2017-02-09 16:14:22] Mapping right_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4)
[2017-02-09 16:28:39] Searching for junctions via segment mapping
Warning: junction database is empty!
[2017-02-09 16:42:58] Retrieving sequences for splices
[2017-02-09 16:42:58] Indexing splices
Warning: Empty fasta file: 'OUTPUTNAME/tmp/segment_juncs.fa'
Warning: All fasta inputs were empty
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 OUTPUTNAME/tmp/segment_juncs.fa OUTPUTNAME/tmp/segment_juncs
[FAILED]
Error: Splice sequence indexing failed with err =1
It is not clear if you did the special one time
tophat
run with -G option to create the transcriptome indextranscriptome_data/ERCC92
? You can't combine that with regular mapping.yes, I did the one-time run to create the index files and then used tophat2 as shown above or without using -G.
Have you investigated this?
It looks like when you trimmed the data the read order may have gotten messed up. You can use
repair.sh
from BBMap to fix that problem ( C: Calculating number of reads for paired end reads? ) or better yet re-do the trimming withbbduk.sh
.Thanks so much for noticing this warning. Indeed, there was an issue that occurred after cutadapt, which is now fixed.
The problem of tophat2 not being able to handle ERCC92.gtf, however, persists. Again, here are the steps I have taken:
Using the no-gtf-juncs command bypasses the early "Warning: TopHat did not find any junctions in GTF file" message. Still, the program aborts at a late step (see below). I am using the original ERCC92.gtf file as provided by Thermo Scientific.
I appreciate your help! Thanks.
[2017-02-14 14:56:18] Beginning TopHat run (v2.1.0)
[2017-02-14 14:56:18] Checking for Bowtie Bowtie version: 2.2.5.0 [2017-02-14 14:56:18] Checking for Bowtie index files (transcriptome).. [2017-02-14 14:56:18] Checking for Bowtie index files (genome).. [2017-02-14 14:56:18] Checking for reference FASTA file [2017-02-14 14:56:18] Generating SAM header for ERCC92 [2017-02-14 14:56:18] Preparing reads left reads: min. length=101, max. length=101, 72025766 kept reads (3942 discarded) right reads: min. length=101, max. length=101, 71833708 kept reads (196000 discarded) [2017-02-14 15:25:55] Using pre-built transcriptome data.. [2017-02-14 15:25:56] Mapping left_kept_reads to transcriptome ERCC92 with Bowtie2 [2017-02-14 15:46:14] Mapping right_kept_reads to transcriptome ERCC92 with Bowtie2 [2017-02-14 16:06:58] Resuming TopHat pipeline with unmapped reads [2017-02-14 16:06:58] Mapping left_kept_reads.m2g_um to genome ERCC92 with Bowtie2 [2017-02-14 17:09:39] Mapping left_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4) [2017-02-14 17:18:29] Mapping left_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4) [2017-02-14 17:27:40] Mapping left_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4) [2017-02-14 17:37:28] Mapping left_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4) [2017-02-14 17:48:00] Mapping right_kept_reads.m2g_um to genome ERCC92 with Bowtie2 [2017-02-14 18:48:10] Mapping right_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4) [2017-02-14 18:56:41] Mapping right_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4) [2017-02-14 19:05:26] Mapping right_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4) [2017-02-14 19:14:26] Mapping right_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4) [2017-02-14 19:24:23] Searching for junctions via segment mapping Warning: junction database is empty! [2017-02-14 19:27:40] Retrieving sequences for splices [2017-02-14 19:27:40] Indexing splices Warning: Empty fasta file: 'OUTPUT/tmp/segment_juncs.fa' Warning: All fasta inputs were empty Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 OUTPUT/tmp/segment_juncs.fa 25_untrim_ERCC92_nojuncs/tmp/segment_juncs [FAILED] Error: Splice sequence indexing failed with err =1