Question

RNA SEQ ERCC

0

Entering edit mode

6.3 years ago

Payal ▴ 160

Hi, I am trying to analyze a RNA seq dataset with ERCC spike ins. I am using Tophat because I am also supposed to look at fusion genes and so far I haven't found a proper pipeline for that!! (Yes I know STAR has STAR fusion but STAR just crashed (32GB). I don't think HISAT has fusion detection functionality!!)

I am trying to run tophat for the human dataset. I merged the ERCC gtf file with Human gtf file. Also merged the ERCC fa file with Human fa file and then indexed it..then running tophat with the trimmed fastq files.

Tophat 2.1.1 Bowtie 2.2.5

tophat --no-novel-juncs --no-coverage-search -r 100 -p 8 -G Homo_sapiens.GRCh38.91_ERCC92.gtf hg38_ERCC92 forward.fastq reverse.fastq

But I getting an error saying:

Warning: Empty fasta file: './tophat_out/tmp/segment_juncs.fa' Warning: All fasta inputs were empty Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs [FAILED] Error: Splice sequence indexing failed with err =1

Please suggest if is there any other tool or pipeline I can use to analyze the dataset?

Thanks, Payal

RNA-Seq ERCC tophat bowtie • 3.1k views

ADD COMMENT • link 6.3 years ago by Payal ▴ 160

0

Entering edit mode

Is there a reason you want to include the ERCC spike-ins? They're usually pretty useless.

Also, trying to get a machine with more than 32GB RAM is a better option than sticking with tophat.

ADD REPLY • link 6.3 years ago by Devon Ryan 104k

0

Entering edit mode

For now thats the only server I have..we are trying to increase our capabilities!! But for now thats all I have got to work with!!

I am sorry but I don't have an answer to why they included ERCC spikeins because neither was I not involved in the study design or the wet lab part of the experiment. All I can think of is they wanted some kind of internal controls or standards!! I was just handed over the data and now I have to figure out how to get meaningful results out of it!!

ADD REPLY • link 6.3 years ago by Payal ▴ 160

0

Entering edit mode

Including the spike-ins in the sequencing isn't uncommon, they tend to just get ignored once one does the analysis, since they tend to do more harm than good. I suggest using the unmodified GTF file (without the spike-ins) and see if the tophat issue goes away.

BTW, you might even be able to use usegalaxy.org to get access to enough memory.

ADD REPLY • link 6.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks... let me try those two options !!!

ADD REPLY • link 6.3 years ago by Payal ▴ 160

0

Entering edit mode

I used ERCC spike-ins with tophat using indexes from igenome....give it a try.

Using igenome to run a seq analysis with Tophat/Cofflinks.....but how do I add the ERCC sequences to the the reference transcriptome and reference genome?

ADD REPLY • link 6.3 years ago by BioinfGuru ★ 1.7k

0

Entering edit mode

Yup I did look into this post while looking for answers... another problem I found was if the genome and gtf files don’t have same annotation then it can throw errors, so I downloaded both the gtf and genome fa file from Ensemble db!!!

ADD REPLY • link 6.3 years ago by Payal ▴ 160

score 0 · Answer 1 · 2018-01-04

What I did to make it work:

Previously I installed Tophat 2.1.1 and Bowtie 2.2.5 via Bioconda. I was getting the error because for some reason these versions of Tophat and Bowtie doesn't work together. So I installed Tophat from Source on my system.

First I installed 2.1.1 - the plain topahat run was fine, but it threw error when I ran fusion-serach!! Then I installed Tophat 2.1.0, the previous version and a very old Bowtie version 0.12.9 was already installed on my system, so I was able to run the program.

All I realized its better to use 2.1.0 or previous versions of Tophat and Bowtie!!

Payal.