Question: tophat2 not working with ERCC92
1
gravatar for Andre
2.1 years ago by
Andre10
Baylor College of Medicine
Andre10 wrote:

I can not get ERCC92 reads with tophat2 on RNAseq data that was created from samples with spike-in.

I have downloaded the ERCC92 fa and gtf files from Thermo Scientific and created indexes with

Bowtie2-build ERCC92.fa ERCC92

I then went ahead and used tophat2 on the paired end files trimmed_fwd.fastq and trimmed_rev.fastq:

./tophat2 –G ERCC92.gtf –o OUTPUTNAME \--transcriptome-index=transcriptome_data/ERCC92 \ERCC92 trimmed_fwd.fastq trimmed_rev.fastq

Tophat2 runs smoothly but creates an error at the very end and fails to provide an accepted_hits.bam file. I am pasting the error message at the bottom of this post. I also tried running Tophat2 without the “-G” option but that failed as well.

I also tried to append the ERCC92.gtf and ERCC92.fa to the mm9_igenome.gtf and mm9_igenome.fa genome (mouse) or hg19_igenome.gtf and hg19_igenome.fa (human). For instructions see: Using igenome to run a seq analysis with Tophat/Cofflinks.....but how do I add the ERCC sequences to the the reference transcriptome and reference genome? Unfortunately, that didn’t work either. In this case, we get good data for all mouse/human genes, but the 92 ERCC genes are listed with no reads (FPKM = 0).

Does anybody know what I am missing? Thank you very much!

Here is the error message from tophat2:

[2017-02-09 09:16:41] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2017-02-09 09:16:41] Checking for Bowtie
          Bowtie version:     2.2.5.0
[2017-02-09 09:16:41] Checking for Bowtie index files (transcriptome)..
[2017-02-09 09:16:41] Checking for Bowtie index files (genome)..
[2017-02-09 09:16:41] Checking for reference FASTA file
[2017-02-09 09:16:41] Generating SAM header for ERCC92
[2017-02-09 09:16:42] Reading known junctions from GTF file
    Warning: TopHat did not find any junctions in GTF file
[2017-02-09 09:16:42] Preparing reads

WARNING: read pairing issues detected (check prep_reads.log) !

     left reads: min. length=12, max. length=101, 72023097 kept reads (6565 discarded)
    right reads: min. length=24, max. length=101, 71833581 kept reads (196081 discarded)
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2017-02-09 09:54:04] Using pre-built transcriptome data..
[2017-02-09 09:54:05] Mapping left_kept_reads to transcriptome ERCC92 with Bowtie2 
[2017-02-09 10:31:36] Mapping right_kept_reads to transcriptome ERCC92 with Bowtie2 
[2017-02-09 11:09:01] Resuming TopHat pipeline with unmapped reads
[2017-02-09 11:09:01] Mapping left_kept_reads.m2g_um to genome ERCC92 with Bowtie2 
[2017-02-09 12:55:45] Mapping left_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4)
[2017-02-09 13:06:43] Mapping left_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4)
[2017-02-09 13:18:15] Mapping left_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4)
[2017-02-09 13:31:36] Mapping left_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4)
[2017-02-09 13:48:04] Mapping right_kept_reads.m2g_um to genome ERCC92 with Bowtie2 
[2017-02-09 15:45:02] Mapping right_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4)
[2017-02-09 15:54:43] Mapping right_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4)
[2017-02-09 16:05:01] Mapping right_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4)
[2017-02-09 16:14:22] Mapping right_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4)
[2017-02-09 16:28:39] Searching for junctions via segment mapping
Warning: junction database is empty!
[2017-02-09 16:42:58] Retrieving sequences for splices
[2017-02-09 16:42:58] Indexing splices
Warning: Empty fasta file: 'OUTPUTNAME/tmp/segment_juncs.fa'
Warning: All fasta inputs were empty
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 OUTPUTNAME/tmp/segment_juncs.fa OUTPUTNAME/tmp/segment_juncs 
    [FAILED]
Error: Splice sequence indexing failed with err =1
rna-seq • 1.3k views
ADD COMMENTlink modified 23 months ago • written 2.1 years ago by Andre10

It is not clear if you did the special one time tophat run with -G option to create the transcriptome index transcriptome_data/ERCC92? You can't combine that with regular mapping.

For example, in order to just prepare the transcriptome index files for a specific annotation, an initial, single TopHat run could be invoked like this: tophat -G known_genes.gtf \ --transcriptome-index=transcriptome_data/known \ hg19 In this example TopHat will create the transcriptome_data directory in the current directory (if it doesn't exist already) containing files known.gff, known.fa, known.fa.tlst, known.fa.ver and the known.* Bowtie index files.

ADD REPLYlink written 2.1 years ago by genomax64k

yes, I did the one-time run to create the index files and then used tophat2 as shown above or without using -G.

ADD REPLYlink written 2.1 years ago by Andre10
1

Have you investigated this?

WARNING: read pairing issues detected (check prep_reads.log) !

It looks like when you trimmed the data the read order may have gotten messed up. You can use repair.sh from BBMap to fix that problem ( C: Calculating number of reads for paired end reads? ) or better yet re-do the trimming with bbduk.sh.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genomax64k

Thanks so much for noticing this warning. Indeed, there was an issue that occurred after cutadapt, which is now fixed.

The problem of tophat2 not being able to handle ERCC92.gtf, however, persists. Again, here are the steps I have taken:

  1. bowtie2-build ERCC92.fa ERCC92
  2. ./tophat2 -G ERCC92.gtf --transcriptome-index=transcriptome_data/ERCC92 \ERCC92
  3. ./tophat2 -G ERCC92.gtf -o OUTPUT --no-gtf-juncs --transcriptome-index=transcriptome_data/ERCC92 \ERCC92 fwd.fastq rev.fastq

Using the no-gtf-juncs command bypasses the early "Warning: TopHat did not find any junctions in GTF file" message. Still, the program aborts at a late step (see below). I am using the original ERCC92.gtf file as provided by Thermo Scientific.

I appreciate your help! Thanks.

[2017-02-14 14:56:18] Beginning TopHat run (v2.1.0)

[2017-02-14 14:56:18] Checking for Bowtie Bowtie version: 2.2.5.0 [2017-02-14 14:56:18] Checking for Bowtie index files (transcriptome).. [2017-02-14 14:56:18] Checking for Bowtie index files (genome).. [2017-02-14 14:56:18] Checking for reference FASTA file [2017-02-14 14:56:18] Generating SAM header for ERCC92 [2017-02-14 14:56:18] Preparing reads left reads: min. length=101, max. length=101, 72025766 kept reads (3942 discarded) right reads: min. length=101, max. length=101, 71833708 kept reads (196000 discarded) [2017-02-14 15:25:55] Using pre-built transcriptome data.. [2017-02-14 15:25:56] Mapping left_kept_reads to transcriptome ERCC92 with Bowtie2 [2017-02-14 15:46:14] Mapping right_kept_reads to transcriptome ERCC92 with Bowtie2 [2017-02-14 16:06:58] Resuming TopHat pipeline with unmapped reads [2017-02-14 16:06:58] Mapping left_kept_reads.m2g_um to genome ERCC92 with Bowtie2 [2017-02-14 17:09:39] Mapping left_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4) [2017-02-14 17:18:29] Mapping left_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4) [2017-02-14 17:27:40] Mapping left_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4) [2017-02-14 17:37:28] Mapping left_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4) [2017-02-14 17:48:00] Mapping right_kept_reads.m2g_um to genome ERCC92 with Bowtie2 [2017-02-14 18:48:10] Mapping right_kept_reads.m2g_um_seg1 to genome ERCC92 with Bowtie2 (1/4) [2017-02-14 18:56:41] Mapping right_kept_reads.m2g_um_seg2 to genome ERCC92 with Bowtie2 (2/4) [2017-02-14 19:05:26] Mapping right_kept_reads.m2g_um_seg3 to genome ERCC92 with Bowtie2 (3/4) [2017-02-14 19:14:26] Mapping right_kept_reads.m2g_um_seg4 to genome ERCC92 with Bowtie2 (4/4) [2017-02-14 19:24:23] Searching for junctions via segment mapping Warning: junction database is empty! [2017-02-14 19:27:40] Retrieving sequences for splices [2017-02-14 19:27:40] Indexing splices Warning: Empty fasta file: 'OUTPUT/tmp/segment_juncs.fa' Warning: All fasta inputs were empty Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 OUTPUT/tmp/segment_juncs.fa 25_untrim_ERCC92_nojuncs/tmp/segment_juncs [FAILED] Error: Splice sequence indexing failed with err =1

ADD REPLYlink written 2.1 years ago by Andre10
1
gravatar for jimguo22
2.0 years ago by
jimguo2210
jimguo2210 wrote:

Did you try with Bowtie2? Since there is no splicing variants in ERCC92, you won't have junction reads as stated in the error message. This might be what caused the failure of tophat. Jim

ADD COMMENTlink written 2.0 years ago by jimguo2210
0
gravatar for Andre
23 months ago by
Andre10
Baylor College of Medicine
Andre10 wrote:

Thank you, Jim. Yes, I have circumvented the issue by using Bowtie2 for the ERCC92 "genes" and tophat2 for the transcriptome. I am then using cufflinks with the Bowtie2 output and ERCC92.gtf or with the tophat2 output and igenome.gtf.

ADD COMMENTlink written 23 months ago by Andre10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 743 users visited in the last hour