Weird Tophat2 Error Message
2
0
Entering edit mode
7.8 years ago
Nick ▴ 290

I am trying to run tophat2 on a set of SE reads:

tophat2 ../../../references/genomes/xen-trop/ens71/xen-trop-4-2-71 CO1_CGATGT_L008.fastq --min-intron-length 6 --max-insertion-length 3 --max-deletion-length 3 --b2-seed 73 --solexa1.3-quals --microexon-search --num-threads 32 --library-type fr-unstranded --no-coverage-search --GTF ../../../references/gtf/xen-trop/ens71/Xenopus_tropicalis.JGI_4.2.71.gtf


But I am getting an weird error message:

[2013-08-10 14:34:22] Beginning TopHat run (v2.0.9)
[2013-08-10 14:34:22] Checking for Bowtie
Bowtie version:        2.1.0.0
[2013-08-10 14:34:22] Checking for Samtools
Samtools version:        0.1.19.0
[2013-08-10 14:34:22] Checking for Bowtie index files (genome)..
[2013-08-10 14:34:22] Checking for reference FASTA file
[2013-08-10 14:34:22] Generating SAM header for ../../../references/genomes/xen-trop/ens71/xen-trop-4-2-71
Traceback (most recent call last): File "/galaxy/software/tophat2/2.0.9/tophat", line 4072, in ?
sys.exit(main())
File "/galaxy/software/tophat2/2.0.9/tophat", line 3926, in main
File "/galaxy/software/tophat2/2.0.9/tophat", line 1829, in check_reads_format
File "/galaxy/software/tophat2/2.0.9/tophat", line 1782, in __init__
self.file=open(filename)
IOError: [Errno 2] No such file or directory: '--min-intron-length'


What is going on here? It seems tophat2 is looking for a file/directory when it encounters --min-intron-length yet I don't understand why. Can you help?

tophat2 • 3.9k views
1
Entering edit mode

Try specifying the command as suggested in the manual, i.e., "tophat2 [options] index reads". This sort of error is likely to occur depending on exactly how the tophat python script parses its input. It's likely that the parser takes the first option lacking a "--something" as the index and the next one as the left_reads fastq list. If there are then more, it probably just takes that as the right_reads fastq list and assumes you have paired input. This isn't technically a bug, then, since you're specifying the command incorrectly...though I think a change in the python code would be in order since this sort of error isn't going to be uncommon.

0
Entering edit mode
7.8 years ago
Nick ▴ 290

Interestingly, if I discard the optional arguments and submit just the two mandatory parameters (index and fastq file) than there is no error:

tophat2 ../../../references/genomes/xen-trop/ens71/xen-trop-4-2-71 CO1_CGATGT_L008.fastq


Also, if I add a second file (as in the case of a PE sample) than there is no error, either:

tophat2 ../../../references/genomes/xen-trop/ens71/xen-trop-4-2-71 CO1_CGATGT_L008_1.fastq CO1_CGATGT_L008_2.fastq --min-intron-length 6 --max-insertion-length 3 --max-deletion-length 3 --b2-seed 73 --solexa1.3-quals --microexon-search --num-threads 32 --library-type fr-unstranded --no-coverage-search --GTF ../../../references/gtf/xen-trop/ens71/Xenopus_tropicalis.JGI_4.2.71.gtf


So it is just the case of a SE sample with optional parameters which is causing the error. This is really bizarre. Any help is appreciated.

0
Entering edit mode
7.8 years ago
Nick ▴ 290

Thank you, dpryan79 - this was, indeed, the problem. It is, actually, a very sneaky idiosyncrasy. I now realise that I have to re-analyse some samples I analysed in the past because tophat essentially, stops trying to make sense of the parameters which come after the second fastq file. In my case, in the past I used to ran tophat for PE samples in the following way:

tophat2 <index> <file1> <file2> <options>


I wasn't getting the error I reported here (this is the first time I am using tophat with a SE sample but I have been using it for nearly a year on PE samples) but none of the options were, it seems, taken into account. A very lame implementation if you ask me which is a shame as tophat seems otherwise a decent tool. I wonder how many other people have been tricked by this idiosyncrasy but haven't realised it yet.

1
Entering edit mode

Yeah, I expect a number of people have been bitten by this. I might look more into the tophat python script and see if I can just submit a patch to either throw a warning in this case or simply deal with it properly (the tophat script itself processes the command line input in a few different functions, none of which simply use argparse, which means more coding and less robustness).

0
Entering edit mode

I am also getting similar error.

[2018-07-09 10:58:19] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2018-07-09 10:58:19] Checking for Bowtie
Bowtie version:        2.2.6.0
[2018-07-09 10:58:19] Checking for Bowtie index files (genome)..
[2018-07-09 10:58:19] Checking for reference FASTA file
[2018-07-09 10:58:19] Generating SAM header for /home/archana87/bowtie2_index/hg19
[2018-07-09 11:00:04] Mapping left_kept_reads to genome hg19 with Bowtie2
[2018-07-09 11:08:08] Mapping left_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2018-07-09 11:10:33] Mapping left_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2018-07-09 11:12:57] Searching for junctions via segment mapping
[2018-07-09 11:23:15] Retrieving sequences for splices
[2018-07-09 14:45:43] Indexing splices
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/__init__.py", line 885, in emit
self.flush()
File "/usr/lib/python2.7/logging/__init__.py", line 845, in flush
self.stream.flush()
IOError: [Errno 22] Invalid argument
Logged from file tophat, line 1224
Traceback (most recent call last):
File "/usr/bin/tophat", line 4095, in <module>
sys.exit(main())
File "/usr/bin/tophat", line 4061, in main
user_supplied_deletions)
File "/usr/bin/tophat", line 3683, in spliced_alignment
File "/usr/bin/tophat", line 2585, in build_juncs_index
external_splices_out_prefix = build_juncs_bwt_index(is_bowtie2, external_splices_out_prefix, color)
File "/usr/bin/tophat", line 2519, in build_juncs_bwt_index
print >> run_log, " ".join(bowtie_build_cmd)
IOError: [Errno 22] Invalid argument


Any help is much appreciated. Thanks

0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon followed by DESEq2 or edgeR.