How to add tophat and bowtie to the path?
3
2
Entering edit mode
5.0 years ago
mirza ▴ 140

Hi, I am using ubuntu 16.04 and tried using a command suggested in one of the posts () to add tophat and bowtie to my path but I think I wasn't successful.

The command I used,

$export PATH=$PATH:$home/gjjha/Downloads/softwares/bowtie2-2.2.9/index:/home/gjjha/Downloads/softwares/tophat-2.1.1  $ which bowtie2 returns

/usr/bin/bowtie2 but


echo $PATH shows neither of them in the path. I also tried running tophat from a directory , it also shows that tophat can't find the b2 index files. What am I doing wrong? export PATH bowtie2.2.9 tophat2 • 4.0k views ADD COMMENT 1 Entering edit mode You should also be able to install bowtie2 (2.2.6-2) and tophat (2.1.0) from the Ubuntu Software Center. ADD REPLY 0 Entering edit mode Very stupid check question: are you doing echo$PATH from the same terminal, or from another one?

0
Entering edit mode

Well, thanks for the language. You might be an expert in linux but I am really new to it and am trying to learn so that I can run these tools for my research work. Anyways, I did realize my mistake that I was checking in the same terminal.
Considering, you are an expert and I am stupid, may be you can also shed some light on this, when I am trying to run tophat from a dir containing the files to be mapped, it returns the error, Error: Could not find Bowtie 2 index files (I already built the b2 index, contained in the b2 index dir)

3
Entering edit mode

Macspider wasn't calling you stupid, he/she was calling his/her own question "stupid".

Anyway, what are you using for the <bowtie_index> argument and where are the indices?

3
Entering edit mode

Chill :D the stupid thing was my question, but every question is worth asking!

If you export a variable from terminal 1, and then try to call it from terminal 2, you might not retrieve it (they load environmental variables when you open them).

If you want to have the command in your path, best thing is to edit your .bash_profile (or .profile, or .bashrc, depending on where your $PATH variable is declared) adding: :$home/gjjha/Downloads/softwares/bowtie2-2.2.9/index:/home/gjjha/Downloads/softwares/tophat-2.1.1


at the end of it.

For the bowtie2 indexes, you have to specify the --basename. This means that if your indexes are all called like whatever.bt2 you have to specify -x whatever in the bowtie2 command. ;)

0
Entering edit mode

I think $home should really be /home. export PATH=$PATH:/home/gjjha/Downloads/softwares/bowtie2-2.2.9/index:/home/gjjha/Downloads/softwares/tophat-2.1.1 Having said that, it would be better not to put random folders in the PATH, as Michael mentioned.

0
Entering edit mode

Hi everyone. @Macspider sorry, I misunderstood, my bad. Couldn't clarify yesterday since I am a new user and can post only 5 times. I am writing in details so that I don’t miss any point this time. I am using ubuntu 16.04 on my workstn (256gb RAM). I have downloaded and installed bowtie2.2.9 and tophat2 according to the instruction given on tophat page of ccb.jhu.edu. I am successful in building indices and running tophat but only from the bowtie2 index directory, where the b2 index files are. This limits my usage coz I need to copy my raw read files in the index directory every time and can run one set of reads at a time only. So, I tried adding the location of these tools and the index dir to the path thinking it will enable me to run multiple files parallelly, from any other directory. Now I understand that these tools are installed in /usr/bin as “which” returns /usr/bin

$which bowtie2 returns /usr/bin/bowtie2$ which tophat2 returns /usr/bin/tophat2

But yesterday, using answers in some post, I used this command for adding the location of index directory to my PATH

$export PATH=$PATH:$home/mirza/Downloads/softwares/bowtie2-2.2.9/index:/home/mirza/Downloads/softwares/tophat-2.1.1 echo$PATH returns

I am still unable to run tophat2 from any directory other than the index (containing the indices) dir within bowtie2 dir. I need help with,

1. How to add these tools to my path?

2. How to undo the above command I used to add the index dir & tools to my path?

2
Entering edit mode

This limits my usage coz I need to copy my raw read files in the index directory every time and can run one set of reads at a time only

So this means that if you use the full path of the read files you don't get them? If you have these files: R1 = /path/to/reads_1.fq R2 = /path/to/reads_2.fq INDEX = /path/to/index/basename (don't consider the *.ht2 extension)

You should be able to run tophat2 from everywhere specifying full paths. Like:

/path/to/tophat2 [OPTIONS] /path/to/index/basename /path/to/reads_1.fq /path/to/reads_2.fq


No need to copy files!

Moreover, if you need to run many sets, you can either run many tophat runs changing the input files or using their special list comprehension in the input, separating by a comma the files you want to input (comma, without space afterwards), like they write in the program's help:

tophat [options] <bowtie_index> <reads1[,reads2,...]> [reads1[,reads2,...]]

0
Entering edit mode

@Macspider One question, when we run tophat for a file, the results are saved in the tophat_out directory. So, when we are using the command tophat [options] <bowtie_index> <reads1[,reads2,...]&gt; [reads1[,reads2,...]]<="" p="">

to run multiple PE sets, do we need to specify an output dir for each pair too or it will create separate directories itself?

0
Entering edit mode

Yes you do need to specify a unique new output directory for each sample. Every sample needs to be run independently. This is important since output from tophat has files with exactly the same name(s) for every sample.

0
Entering edit mode

There is a difference between running 2 tophat runs on two sets, and 1 run on two sets. In detail:

• If you run separately the two runs, you will specify an output folder in each, and it's obviously the right thing to do to name them differently cause files inside will have the same name and will be therefore overwriting theirselves (a mess).
• If you run them together, you will get a single BAM file that has all the results merged. This might be what you want, might be not. It's up to you! For example, if you have 3 replicates it is very unlikely that you want this, while if you are doing a gene prediction you do want this because you pile up evidence, and don't aim for expression levels.
0
Entering edit mode

@Macspider I hv paired reads for replicates and different experimental conditions. I definitely want separate results not merged. Now how can I run them in parallel, to save time & efforts and how can I define separate output dir for each pair in one command?

1
Entering edit mode

Tophat has a -p option that sets how many threads you want to use. This has an upper limit in how many cores your machine / cluster has available, or in how many cores you are allowed to use if you are using some sort of queue manager.

That said, if you want to do parallel runs there is no built-in pipeline to do that, you have to do it yourself, but it's just as easy as launch N times the same command changing the files.

Since you don't want to overheat your system or to force more threads than the tolerated number in your cores, you'd probably be safe if you use for each run a number of threads (-p option) that is the result of:

Number of cores you can use (f.e. 30) / Number of runs (let's say 6) = 5 threads (in this example)

0
Entering edit mode

The power tool for running things in parallel is gnu-parallel, which can also restrict memory usage, specify the number of jobs, avoid swapping, show progress...

Or alternatively more complicated pipeline tools such as snakemake

0
Entering edit mode

0
Entering edit mode

@Macspider thanks I'll try these options and let you know tomorrow. @Michael Dondrup I have answered your reply below.

0
Entering edit mode

@Macspider

@Devon Ryan

Hi, can you please tell me how to use the -output-dir option in tophat I have tried all these

-o my_out;

--output-dir my_out;

-o ./my_out

to creat my tophat_out directory but still no success. It creates its default dir tophat_out everytime.

I want to thank you all, learning a lot through biostar.

0
Entering edit mode

You should just specify a word and include an eventual path, it will create the folder. Can you paste here the command?

0
Entering edit mode

Hi, I am simply using

tophat b2.index input_file1 input_file2

where 1 & 2 are the left and right reads and its working fine. The output is generated in the default tophat_out directory.

0
Entering edit mode

But how did you use the -o flag?

0
Entering edit mode

as I have written in my earlier reply above, I tried using it in 3 different ways but still got the default dir not mine

tophat b2index input1 input2 -o my_out also,

tophat b2index input1 input2 --output-dir my_out and

tophat b2index input1 input2 -o ./my_out

2
Entering edit mode

Sometimes order matters, try tophat -o my_out b2index input1 input2.

0
Entering edit mode

ok, may be that's the problem. I can try this tomorrow only. Will post the outcome here. Thanks.

1
Entering edit mode

Indeed that's likely the problem. From the manual:

tophat [options]* <genome_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]


-o is an option and should come before the index. That works for me.

1
Entering edit mode

Just a side note to this:

You were lucky that this didn't mess up your data! The positional arguments at the end of the tophat command are considered as the input data, and the ouptut directory is one optional argument (--output-dir). If the output was as well a positional argument, you could have attempted to overwrite your input data with a null output file just by placing it in the wrong slot of the command line.

Bottom line: always be sure of what the manuals and helps say about arguments, if they're to be placed here or there, because this can really mess up things sometimes :)

5
Entering edit mode
5.0 years ago
PATH=$PATH:$home....


should be

PATH=$HOME/bin:$PATH


Edit: note to put the local directory first, so the newer binary overrides the default path

shell variables are case sensitive.

Btw: don't put random download directories in your path, 'install' binaries in a 'proper' location (using unix tools such as install, stow, or even cp if you must), e.g. /usr/local/bin or $HOME/bin, then add export PATH=$HOME/bin:$PATH to your .bashrc or .bashprofile Btw2, not a bioinformatics question ;) ADD COMMENT 0 Entering edit mode @Michael Hi, I did read your comment and used this same exact command line but the echo returns the exact same result as above. :/mirza/Downloads/softwares/bowtie2-2.2.9/index:/home/mirza/Downloads/softwares/tophat-2.1.1:/mirza/Downloads/softwares/bowtie2-2.2.9/index:/home/mirza/Downloads/softwares/bowtie2-2.2.9/index:/home/mirza/Downloads/softwares/tophat-2.1.1 Still unable to run tophat from any other dir (returns the error: couldn't find b2 index files) 1. should I be using this exact command line or some modification is needed? My tools are installed in /usr/bin (as i hv mentioned above). 2. Do I need to undo adding the path I hv added earlier (/home/mirza/Downloads.......) and if yes, how to do it? ADD REPLY 1 Entering edit mode I think you didn't do exactly what I suggested: shell variables are case sensitive. don't put random download directories in your path • make a$HOME/bin directory in your home: mkdir -p $HOME/bin • copy the executable to$HOME/bin: cp -v bowtie $HOME/bin • add PATH=$HOME/bin:$PATH to your .profile or .bashrc depending on what is working best: echo "export PATH=$HOME/bin:$PATH" >>$HOME/.profile
• log out and in again
0
Entering edit mode

Slightly related question: is sourcing .bashrc (source $HOME/.bashrc) a good enough equivalent to logging out and in? ADD REPLY 2 Entering edit mode No because the paths in the PATH variable will be duplicated:$HOME/bin:/home/name/bin:/home/name/bin:$PATH ADD REPLY 0 Entering edit mode Thanks, learned something new! ADD REPLY 0 Entering edit mode You could open another shell by running bash and get the same effect. ADD REPLY 0 Entering edit mode That also duplicates my$PATH. Not sure if that's a problem.

1
Entering edit mode

That does not happen on my CentOS (rocks) distro (could be because of specific config).

That is probably not a problem (though it may look messy). Check this thread for some solutions to clean up PATH.

0
Entering edit mode

Right. Also CentOS here, and my $PATH is a mess. Will have a look at that link. ADD REPLY 0 Entering edit mode thanks, will try and let you know. If you have answer for my above que regarding the output dir for tophat too (my comment on Macspider answer), I'll be grateful. ADD REPLY 0 Entering edit mode @Michael Dondrop Hi, I followed your suggestion and created the$HOME/bin dir as you have suggested above. Other than tophat, bowtie, I also copied blast+2.5 to the dir and added the path to my .bashrc using

export PATH=$PATH:$HOME/bin

logged out, logged in

echo $PATH now returns, :/home/mirza/mirza/Downloads/softwares/bowtie2-2.2.9/index:/home/mirza/Downloads/softwares/tophat-2.1.1:/home/mirza/bin I tried running blastn (blast+) using the following commands blastn -task blastn -query test.fasta -db my_db -outfmt 6 also blastn -task blastn -query test.fasta -db home/mirza/bin/my_db -outfmt 6 but both return the following error, BLAST Database error: No alias or index file found for nucleotide database [home/mirza/Downloads/softwares/blastdb/my_db] in search path [/home/mirza:$/home/mirza/Downloads/softwares/blastdb:]

2
Entering edit mode

I am not sure we can or should solve your path problems here. I can just recommend the follwoing:

• check carefully your .profile, .bashrc, .bash_profile for additional path entries and remove : :/home/mirza/mirza/Downloads/softwares/bowtie2-2.2.9/index:/home/mirza/Downloads/softwares/tophat-2.1.1: wherever you find that.
• I recommended PATH=$HOME/bin:$PATH because you want your own software
• Blast is a totally different topic, see https://www.ncbi.nlm.nih.gov/books/NBK52640/#_chapter1_Configuration_ how to configure this
• Please do yourself a favor and take a basic linux course: /home/mirza vs home/mirza
0
Entering edit mode

Ok, I understand. Thanks. Yes, I know, I need to give sometime to learn the basics.

1
Entering edit mode
5.0 years ago
mirza ▴ 140

@Macspider I did let the run complete, just checked if its creating the output dir and terminated the process, when I use -o flag after the PEreads file names, so my output didn't get messed up. I really want to thank everyone here for their help. I worked out the running parallel mapping sessions by using your help,

$tophat -o output_directory b2.index_base PEreads_1.fq PEreads_2.fq ADD COMMENT 0 Entering edit mode 5.0 years ago Shahzad ▴ 30 I have solved this problem with a very easy solution. Problem occurs when you set$PATH to a directory it clears the previous paths from the system which are essential for to run the software sometimes. So,

echo PATH copy all the paths comes in the results the use export PATH command and add the path of the software you are using : at the end of previous paths

If you cant understand this please see on google how to add multiple paths in linux/ubuntu. Also be-careful about terminal. You should use the same terminal in which you have added the path.

Hope that helps.

0
Entering edit mode

@Shahzad Thank you for your reply. Can you please send me the link to the article you are referring to coz sometimes there are so many similar hits on Google and it all gets confusing. I am new but really interested in learning.