Entering edit mode
3.6 years ago
keeley.mazurkiewicz
▴
50
I am trying to run octoFLU on python, but it requires the installation of FastTree. Since I am using a Windows 10 OS, I need some help with installing FastTree, so Python can recognize it. I never used any of these programs before and I cannot find any resources to help me troubleshoot.
I had to enable Windows Subsystems for Linux when I installed Anaconda, Docker, Ubuntu, and Sublime Text Editor. Conda commands don't work in Anaconda. I have tried them for blast and MAFFT.
Gonna need a bit more info than that. Don't work how?
We tried this for MAFFT and Blast as well. It didn't work for those either. I got MAFFT to work somehow. I think I had to google searrch for mafft.bat and dig deep for that file after uninstalling MAFFT. OR maybe I found a code that got it to transfer from Ubuntu to Anaconda. I honestly don't remember. Blast was an easy download from NCBI.
I am doing a bioinformatics project, specifically in phylogenetic analysis. The pipeline that I want to run is called octoFLU.
Website: https://github.com/flu-crew/octoFLU
I spoke to Tavis Anderson from the USDA about this pipeline. He requested that I install Ubuntu, Sublime Text Editor, Docker, and Anaconda. After our meeting, I found out that I needed to install dendropy, smof, blastn, mafft, and fasttree. I managed to install everything except FastTree. I have been having issues since I use Windows 10, but after talking to a professor at the Department of Computer Information Technology at Purdue University, it seems that this pipeline is just very difficult. He tried installing MAFFT on Windows and on Unix machines with no success. I managed somehow to install MAFFT, but FastTree is the elephant in the room.
GitHub linked this webistire for FastTree installation: http://www.microbesonline.org/fasttree/
For Windows, FastTree is a Windows command-line executable (no SSE). When I downloaded it, it was an application. C:\Users\mazur\Desktop\FastTree.exe is its location on my computer. When I run as administrator, it says Windows Users: Please remember to run this inside a command shell. I don't know what that means. Ubuntu "recognizes" it I think, but Anaconda does not.
Okay, so you are running Ubuntu (installed from Windows Store) on the Subsystem for Linux and you've installed anaconda on Ubuntu, not on Windows, correct? It looks like you might have installed it on Windows based on your channel setup. If so, that's your issue. I was able to install fine on Ubuntu running on the Windows Subsystem for Linux with the following commands (excludes anaconda installation):
Ensure the necessary channels are added:
Additionally, you probably want to actually create a new conda environment for this rather than install in the base conda environment, which you can then activate and install the necessary software into:
Those exact commands worked just fine for me - I'm on a Windows 10 PC, but all of that should be run in the Ubuntu terminal. And of your additional analyses should be run from the Ubuntu terminal as well. If you're going to be doing bioinformatics on a Windows machine, you will very much need to become familiar with that setup.
I followed the instructions on here (https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart) to install Anaconda in Ubuntu, but now I cannot find my octoFLU-master folder in Ubuntu. The file can be found in Anaconda on Windows though when is used cd C:\Users\mazur\Desktop\octoFLU-master. Is there anyway you can walk me through the steps? I think Zoom or Microsoft Teams would be better since I can share my screen with you.
I have the commands to both create and activate the new environment above:
Anyway, once you install conda on Ubuntu, my commands above will be all you need. Installing conda on Ubuntu is as easy as:
And follow the prompts to finish the install. Then my commands above will install all the dependencies you need.
Alternatively, you can just use the docker image they provide, which shouldn't require you to install anything other than docker. The octoFLU Github page has pretty explicit directions for how to use it.
I don't understand how you're able to get this to run and I can't. I have installed Anaconda and the blast, mafft, etc. programs in Ubuntu, but I cannot get the code of octoFLU to work. I have never done this before and all the Python books and experts I have asked couldn't figure this out. I am wasting time trying to run this code and it is only on the same dataset. I still need to run this program on my own data... and somehow figure out how to convert my dataset file from .txt to .fasta
Have you considered opening an issue on their Github page? Did you try using their docker image?
What format is your data in? Coverting to FASTA format is typically trivial if you already have sequence information.
I met with the co-author of octoFLU and we couldn't get the code to work. I have don't know Ubuntu, we only used Anaconda, but stopped because it wouldn't recognize FastTree. We tried docker, but when it run the code, we couldn't find the results. My dataset is in NotePad so it is a text file.
The person who wrote the software couldn't get it to run? You have tried to run it after installing fasttree and all other dependencies in Ubuntu as described? What error are you running into with octoFLU specifically? We need specific commands and error messages to have any chance of helping you.
When using docker, did you follow their instructions for getting the results copied outside of docker?
Yes, it's a text file, but what is the actual format of your data file? What are the first few lines of the file?
Well, I am assuming the issues were due to running Anaconda on Windows. I am following your instructions to a t. So I have some questions for you.
1) I created the new environment ft. If I were to close Ubuntu, how would I activate that environment... or is it not a permanent environment variable?
2) I only know how to open octoFLU.py script not the octoFLU.sh file on my Windows 10 PC. Can I open octoFLU.sh in Sublime Text Editor, so I can edit the paths in octoFLU.sh to connect blastn, makeblastdb, smof, mafft, and fasttree? *I determined their locations/paths by using the which command in Ubuntu.
3) I am lost on what to do after that. Probably due to 2 reasons - (1) I have never done any computer programming before and (2) we were doing the Windows script, not the Linux script. Note: the author uses only Mac, so we were troubeshooting as we walked through the installation of the prerequisite programs.
4) Text document to FASTA file - I used EMBOSS SEQRET Converter to convert my text formatted genetic sequences to FASTA format, so it is a text file of a list of FASTA formatted sequences. I did 10 files, one for each gene segment plus I divided the HA and NA genes by subtype. So PB2, PB1, PA, H1, H3, NP, N1, N2, M, NS are all separate files.
1.) My command above will activate the environment -
conda activate ft
. You should see (ft) next to your command line prompt after doing so. I highly recommend reading the conda manual, as it will help you determine how to manage environments. It's pretty straightforward.2.) The
sh
file is just text, you can open it with Sublime Text, notepad, etc. FASTA files are just text as well. File extensions just indicate that a file has a specific format, not that it actually is that format. You can open a file in notepad, type whatever you want, name it whatever you want, and you'll still be able to open it just fine in notepad. The default file extension is.txt
just so that programs/users are aware it's a text file before opening it, but there's nothing enforcing that is actually is. You should be fine to edit that file in Sublime.4.) Okay, so your files are already in FASTA format! All you have to do is rename them so that the program will recognize them as such. Just replace the
.txt
extension with.fa
.3.) I feel your frustration. But you're close! Once you rename your input files, you should be able to run the
octoFLU.sh
script -bash octoFLU.sh your_data/your_sample.fasta
. If that doesn't work for whatever reason, give us the exact command you use and the error it spits out. You will have to run this on each of your input files if you don't concatenate them all together.Ok. I ran the pipeline in Ubuntu, but the output cannot be found. I did not get an error message though.
Can you post the exact command you used?
octoFLU-master is on my Desktop. I edited the 7th comment in octoFLU.sh to connect the paths.
I saved it as the same file in the octoFLU-master folder on my Desktop. Then I used
conda activate ft
andcd octoFLU
and thebash octoFLU.sh sample_data/query_sample.fasta
. It ran, but the results aren't in my octoFLU-master folder. We had this issue when we ran the script in Docker.The output folder is called query_sample.fasta_Final_Output, but it is empty.
Can you paste the output it spat out on the console as it ran?
Thank you so much for all your help! I met with the other author of the code and we were able to locate the file and run my data set this morning. Seriously without your help I would have definitely lost my marbles.
Hurrayyy. I was getting a bit worried. If you found my answer helpful, consider accepting it so that others will recognize the question has been addressed appropriately without reading through this quite long thread.
Additionally, can you point out where the data ended up in relation to your working directory so that others that run into the same issue will have an idea as to how it can be resolved?