I can not get ERCC92 reads with tophat2 on RNAseq data that was created from samples with spike-in.
I have downloaded the ERCC92 fa and gtf files from Thermo Scientific and created indexes with
Bowtie2-build ERCC92.fa ERCC92
I then went ahead and used tophat2 on the paired end files trimmed_fwd.fastq and trimmed_rev.fastq:
./tophat2 –G ERCC92.gtf –o OUTPUTNAME \--transcriptome-index=transcriptome_data/ERCC92 \ERCC92 trimmed_fwd.fastq trimmed_rev.fastq
Tophat2 runs smoothly but creates an error at the very end and fails to provide an accepted_hits.bam file. I am pasting the error message at the bottom of this post. I also tried running Tophat2 without the “-G” option but that failed as well.
I also tried to append the ERCC92.gtf and ERCC92.fa to the mm9_igenome.gtf and mm9_igenome.fa genome (mouse) or hg19_igenome.gtf and hg19_igenome.fa (human). For instructions see: Using igenome to run a seq analysis with Tophat/Cofflinks.....but how do I add the ERCC sequences to the the reference transcriptome and reference genome? Unfortunately, that didn’t work either. In this case, we get good data for all mouse/human genes, but the 92 ERCC genes are listed with no reads (FPKM = 0).
Does anybody know what I am missing? Thank you very much!
Here is the error message from tophat2:
[2017-02-09 09:16:41] Beginning TopHat run (v2.1.0) ----------------------------------------------- [2017-02-09 09:16:41] Checking for Bowtie Bowtie version: 184.108.40.206 [2017-02-09 09:16:41] Checking for Bowtie index files (transcriptome).. [2017-02-09 09:16:41] Checking for Bowtie index files (genome).. [2017-02-09 09:16:41] Checking for reference FASTA file [2017-02-09 09:16:41] Generating SAM header for ERCC92 [2017-02-09 09:16:42] Reading known junctions from GTF file Warning: TopHat did not find any junctions in GTF file [2017-02-09 09:16:42] Preparing reads WARNING: read pairing issues detected (check prep_reads.log) ! left reads: min. length=12, max. length=101, 72023097 kept reads (6565 discarded) right reads: min. length=24, max. length=101, 71833581 kept reads (196081 discarded) Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places [2017-02-09 09:54:04] Using pre-built transcriptome data.. [2017-02-09 09:54:05] Mapping left_kept_reads to transcriptome ERCC92 with Bowtie2 [2017-02-09 10:31:36] Mapping right_kept_reads to transcriptome ERCC92 with Bowtie2 [2017-02-09 11:09:01] Resuming TopHat pipeline with unmapped reads [2017-02-09 11:09:01] Mapping left_kept_reads.m2g_um to genome ERCC92 with Bowtie2 [2017-02-09 12:55:45] Mapping left_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4) [2017-02-09 13:06:43] Mapping left_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4) [2017-02-09 13:18:15] Mapping left_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4) [2017-02-09 13:31:36] Mapping left_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4) [2017-02-09 13:48:04] Mapping right_kept_reads.m2g_um to genome ERCC92 with Bowtie2 [2017-02-09 15:45:02] Mapping right_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4) [2017-02-09 15:54:43] Mapping right_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4) [2017-02-09 16:05:01] Mapping right_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4) [2017-02-09 16:14:22] Mapping right_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4) [2017-02-09 16:28:39] Searching for junctions via segment mapping Warning: junction database is empty! [2017-02-09 16:42:58] Retrieving sequences for splices [2017-02-09 16:42:58] Indexing splices Warning: Empty fasta file: 'OUTPUTNAME/tmp/segment_juncs.fa' Warning: All fasta inputs were empty Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 OUTPUTNAME/tmp/segment_juncs.fa OUTPUTNAME/tmp/segment_juncs [FAILED] Error: Splice sequence indexing failed with err =1