Question: Unable to find Bowtie2 index
0
gravatar for Explorer
3.0 years ago by
Explorer60
Australia
Explorer60 wrote:

Hi,

I am unable to run Tophat2 as I get an error . Here is the command I run:

tophat2  -p 5  -r 62 –library-type fr-firststrand  -G  /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/gene.gtf –o /home/jmotwani/RNASeq/Alignment_Tophat2 --BOWTIE2_INDEXES  /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/   C95VLANXX-2046D-01-01-01_L003_R1_Trimmed.fastq C95VLANXX-2046D-01-01-01_L003_R2_Trimmed.fastq

I get the following error after I run the above command:

[2016-05-22 22:20:05] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-05-22 22:20:05] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2016-05-22 22:20:05] Checking for Bowtie index files (genome)..

    Error: Could not find Bowtie 2 index files (–library-type.*.bt2l)

The indexed genome was downloaded from Illumina. Do I have to build it after downloading it? I downloaded the genome, gtf file, and indexed files and gave the path of those files in the command above.

Could anyone please comment or advise on this.

Thanks for your time.

Regards, J

rna-seq tool • 4.5k views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Explorer60

Thanks Goutham and WouterDeCoster. I tried but still get the same error.

Not sure if the files provided by iGenomes for Bowtie2Index are incompatible with latest version of tophat.

The files I have in the Bowtie2Index are:

genome.1.bt2 genome.2.bt2 genome.3.bt2 genome.4.bt2
genome.fa genome.rev.1.bt2
genome.rev.2.bt2 tophat_out

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Explorer60

I guess the problem is typo. -library-type instead of --library-type. The error says –library-type.*.bt2l

ADD REPLYlink written 3.0 years ago by geek_y9.6k

Thanks but still getting the same error. I realized that I missed '-' for library-type. I had written -library-type in place of --library-type. I changed it and ran the command again but I get a new error:

tophat: option -? not recognized for detailed help see http://ccb.jhu.edu/software/tophat/manual.shtml

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Explorer60

Could you share the command that you ran? That looks like a different problem than before. Where are you indexes? Did you point the environmental variable to it's location?

Note, from the manual:

The basename of the genome index to be searched. The basename is the name of any of the index files up to but not including the first period. Bowtie first looks in the current directory for the index files, then looks in the indexes subdirectory under the directory where the currently-running bowtie executable is located, then looks in the directory specified in the BOWTIE_INDEXES (or BOWTIE2_INDEXES) environment variable. Please note that it is highly recommended that a FASTA file with the sequence(s) the genome being indexed be present in the same directory with the Bowtie index files and having the name <genome_index_base>.fa. If not present, TopHat will automatically rebuild this FASTA file from the Bowtie index files.

ADD REPLYlink written 3.0 years ago by WouterDeCoster39k

Here is the command:

tophat2 -p 5 -r 62 --library-type fr-firststrand -G /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/gene.gtf –o /home/jmotwani/RNASeq/Alignment_Tophat2 /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome read1.fastq read2.fastq

The indexes are in the location specified in the command. I have not created an environmental variable but have given the whole path instead in the command.

ADD REPLYlink written 3.0 years ago by Explorer60

Have you read what I posted from the manual?

The basename of the genome index to be searched. The basename is the name of any of the index files up to but not including the first period. Bowtie first looks in the current directory for the index files, then looks in the indexes subdirectory under the directory where the currently-running bowtie executable is located, then looks in the directory specified in the BOWTIE_INDEXES (or BOWTIE2_INDEXES) environment variable. Please note that it is highly recommended that a FASTA file with the sequence(s) the genome being indexed be present in the same directory with the Bowtie index files and having the name <genome_index_base>.fa. If not present, TopHat will automatically rebuild this FASTA file from the Bowtie index files.

So this means that you have to specify only the basename of the indexes, that tophat will search in

  1. current directory
  2. in the indexes subdirectory with the bowtie executable
  3. in the direction specified by the environmental variable

I'm not sure if it's a copy paste issue, but I can see in your post here that you have different kinds of '-': "- and –" This can happen when copy pasting commands from e.g Microsoft Word

ADD REPLYlink written 3.0 years ago by WouterDeCoster39k

I think I have managed to get rid of the previous error but now it throws new error!!!

tophat2 --num-threads 5 --mate-inner-dist 62 --library-type fr-firststrand --GTF /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf --output-dir /home/jmotwani/RNASeq/Alignment_Tophat2 /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome read1.fastq read2.fastq

Output:

[2016-05-23 22:07:53] Beginning TopHat run (v2.1.1)

[2016-05-23 22:07:53] Checking for Bowtie Bowtie version: 2.2.9.0 [2016-05-23 22:07:53] Checking for Bowtie index files (genome).. [2016-05-23 22:07:53] Checking for reference FASTA file [2016-05-23 22:07:53] Generating SAM header for /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome [2016-05-23 22:07:55] Reading known junctions from GTF file [2016-05-23 22:08:28] Preparing reads [FAILED] Error running 'prep_reads' Error: qual length (111) differs from seq length (106) for fastq record !

Has anyone come across this problem?

ADD REPLYlink written 3.0 years ago by Explorer60

This tells you that your fastq file is corrupt since the quality string is longer than the sequence. What did you use for trimming the reads? How have you modified/processed your fastq data?

ADD REPLYlink written 3.0 years ago by WouterDeCoster39k

I used a in-house script (cleanadaptors) to trim the raw fastq files. I run the command to trim the data in the following way:

cleanadaptors -I /home/jmotwani/RNASeq/contam.fa -q 20 -x 25 -F C95VLANXX-2046D-01-01-01_L003_R1.fastq -o C95VLANXX-2046D-01-01-01_L003_R1_Trimmed.fastq -G C95VLANXX-2046D-01-01-01_L003_R2.fastq -O C95VLANXX-2046D-01-01-01_L003_R2_trimmed.fastq

-q is for quality and -x is for min length of the read

ADD REPLYlink written 3.0 years ago by Explorer60

I can't help you with an in-house script, obviously.

Please use the comment options correctly to enable threading of questions and replies.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by WouterDeCoster39k
1
gravatar for geek_y
3.0 years ago by
geek_y9.6k
Barcelona/CRG/London/Imperial
geek_y9.6k wrote:

--BOWTIE2_INDEXES does not exists. So you need not to mention that. Just give the base name of bowtie2 index.

 tophat2  -p 5  -r 62 –library-type fr-firststrand  \
-G /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/gene.gtf \
-o /home/jmotwani/RNASeq/Alignment_Tophat2  \
/home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome   \
C95VLANXX-2046D-01-01-01_L003_R1_Trimmed.fastq C95VLANXX-2046D-01-01-01_L003_R2_Trimmed.fastq
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by geek_y9.6k

If I'm not mistaken, BOWTIE2_INDEXES is an environmental variable in which tophat will search for the indexes.

ADD REPLYlink written 3.0 years ago by WouterDeCoster39k

Its an environmental variable but not an argument for tophat2 I guess.

ADD REPLYlink written 3.0 years ago by geek_y9.6k

I agree :-) you indeed just specify the basename for tophat2

ADD REPLYlink written 3.0 years ago by WouterDeCoster39k
0
gravatar for Explorer
3.0 years ago by
Explorer60
Australia
Explorer60 wrote:

Thanks all tophat2 works fine now. The issue was not having double hyphen for library-type parameter.Also, tophat2 doesn't like it if the mixture of option types are given . By this I mean is that it accepts either all the options with single hypen or all with double hypen,if i give mixture of these two types it doesnt like it.

ADD COMMENTlink written 3.0 years ago by Explorer60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 959 users visited in the last hour