Question: Converting runpsipred Script to Work on Windows OS
0
gravatar for Bara'a
4 months ago by
Bara'a220
Amman - Jordan
Bara'a220 wrote:

Hi all...

Following the psipred installation instruction I have installed blast+ executables on G:\ drive along with the impala utility from legacy blast (makemat and copymat binary files) in bin folder and configured the environment variable to the path they were installed on then I have installed and untarred psipred on the same drive.

When I came across converting the runpsipred script into python to work on windows 7 machine I was stuck in understanding the underlying functionality of each linux command in the script ... I spent a lot of time trying to grasp the idea of certain commands but I had no luck.

I wrote the following code but is not yet completed as I'm having errors, but my main problem is that my code is not seeing psipred executables although it's there along with appropriate database required:

import os
import sys
print(os.getcwd())
os.chdir('G:\\psipred\\')
print(os.getcwd())
os.system('set dbname=`uniref90.fasta`')
os.system('set ncbidir=`G:\blast-2.7.1+\bin`')
os.system('set execdir=`G:\psipred\bin`')
os.system('set datadir=`G:\psipred\data`')
os.system('set basename=`test_seq`')
os.system('set rootname=`test_seq`')
os.system('set hostid=`hostid`')
print(os.system('set hostid =`hostid`'))
print(os.system('set tmproot=psitmp$$$hostid'))
os.system('copy -f test_seq.fasta $tmproot.fasta')
os.system('ncbidir/psiblast -b 0 -j 3 -h 0.001 -v 5000 -d dbname -i tmproot.fasta -C tmproot.chk  tmproot.blast')

This gives the following errors:

C:\Users\Al-Hammad\Desktop\SQP-IRS
G:\psipred
 0
 0
The system cannot find the file specified.
'ncbidir' is not recognized as an internal or external command,
operable program or batch file.

but when I do this:

import os
import sys
os.system('blastdbcmd -db uniref90 -entry nm_000122 -outfmt "%f" -out test_query.txt')
os.system('blastn -query test_query.txt -db uniref90 -out output.txt')
print ("Done !!")

things works perfectly which means blast+ executables are there and working !!

Can you please give me hints on how to convert this linux commands into cmd for windows ?! I'm not familiar with linux at all and really need to get this working on my machine? and how can I direct my python script to see psipred bin and data folders globally without having to modify installation environment variable ?!

I would be so grateful for any help.

windows psipred • 352 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by Bara'a220
1

I'm no bioinformatics-on-windows pro, but basically your script makes a bunch of environment variables (which it then appears not to use), and makes some system calls, but the variable expansions are almost certainly wrong.

For example,

os.system('set ncbidir=`G:\blast-2.7.1+\bin`')

Uses DOS backslashes for the filepath, but when you make a system call via os, there is a forward slash:

os.system('ncbidir/psiblast -b 0 -j 3 -h 0.001 -v 5000 -d dbname -i tmproot.fasta -C tmproot.chk  tmproot.blast')
                  ^

I think there are a number of errors here, but that's one of the major ones.

If blastn is available on the commandline without needing to specify a file path, as this os.system('blastn -query test_query.txt -db uniref90 -out output.txt') suggests, then you don't need the ncbidir variable at all AFAICT.

ADD REPLYlink modified 4 months ago • written 4 months ago by jrj.healey9.2k

Ok... I've tried performing psiblast on a relatively small database just to check if it's working as expected:

import os
import sys
os.system('blastdbcmd -db uniref90 -entry nm_000122 -outfmt "%f" -out test_query.txt')
os.system('blastn -query test_query.txt -db uniref90 -out output.txt')
os.system('makeblastdb -in refseq_rna.00 -dbtype prot -out refseq_rna.00')
os.system('psiblast -query test_seq.fasta -db refseq_rna.00 -num_iterations=6 -evalue=0.005 -out test_result -out_pssm=PSSMtest_results')
print('Done !!')

but it gives the following error:

Building a new DB, current time: 07/27/2018 19:43:16
New DB name:   C:\Users\Al-Hammad\Desktop\SQP-IRS\refseq_rna.00
New DB title:  refseq_rna.00
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
BLAST options error: File refseq_rna.00 does not exist
BLAST Database error: No alias or index file found for protein database [refseq_rna.00] in search path [C:\Users\Al-Hammad\Desktop\SQP-IRS;G:\blast-2.7.1+\db;]
 Done !!

Now it's NOT recognizing my files, What I'm doing wrong here?!

ADD REPLYlink modified 4 months ago • written 4 months ago by Bara'a220

Does the file exist in the file system? Maybe the previous command generated an output with a different name/errored out?

ADD REPLYlink written 4 months ago by RamRS19k

Yes, the file is there in the path specified by the error message but it is not seen by the command for some reason !! I've searched for the output that's supposed to be generated by the command but found nothing ... why on earth psipred is not for windows, that's really disappointing.

ADD REPLYlink written 4 months ago by Bara'a220

Maybe the file was not ready when the command executed? Or maybe the error message is wonky and the problem is that you need to index the file before you can use it?

ADD REPLYlink written 4 months ago by RamRS19k

I tried to provide the full path for my executables and database but still not working !!

ADD REPLYlink written 4 months ago by Bara'a220

I mean, step-2 creates files and step-3 uses those files. Are you sure step-2 is done processing and the output files are ready before step-3 is executed?

ADD REPLYlink written 4 months ago by RamRS19k

Are you running the commands whilst being in the same directory as them?

ADD REPLYlink written 4 months ago by jrj.healey9.2k

Yes all my files are on the same directory !!

ADD REPLYlink written 4 months ago by Bara'a220
1

You could declare the variable in python and then build the command as a string in python and use just one os.system() call, no? Or create a config file where you read the variable values from, so you don't need to change the script each time to execute it.

In its current form, this script abstracts nothing and accomplishes just adding another programming paradigm (python) into the mix without leveraging any of what python has to offer.

ADD REPLYlink written 4 months ago by RamRS19k

Yes you're right, it's not making use of any python facilities yet ... I'm working on obtaining psipred results in order to be further plotted for specific purposes using some python modules. Sorry if that causes any inconvenience.

ADD REPLYlink written 4 months ago by Bara'a220
1

Why would it cause me any inconvenience? It's making your life difficult, that's all.

ADD REPLYlink written 4 months ago by RamRS19k
1

Can't you install the Windows Subsystem for Linux? Then you could follow the regular linux install instructions, without having to port the runpsipred script, and without having to leave windows.

ADD REPLYlink written 4 months ago by h.mon22k

That's a good solution though I must do a script to run psipred from a windows OS... why it's not supported, why ?!😭

ADD REPLYlink written 4 months ago by Bara'a220
1

must do a script

Why? If someone insists, find out why they insist on this, because running bioinformatics on vanilla Windows is not something anyone that knows bioinformatics would insist on.

ADD REPLYlink written 4 months ago by RamRS19k

You are absolutely right regarding this ... I thought it would be an easy task to do so but apparently it seems impossible with the world heading towards open source OS. I guess I'm going to break the contract and surrender 👐

ADD REPLYlink written 4 months ago by Bara'a220
1

Simplistic explanation:

Because, historically, big servers run on UNIX and its variants, which are POSIX-compliant. Thus, to a certain extent, it is easy to port between them. Then Linux,which is also POSIX-compliant, came along and took over. Being free, it means it can be installed at no / very low costs on any computer, from laptops to servers to clusters. Which, in turn, makes it easy to develop and test new software on a small computer, and be quite sure it will run the same way on a big server. Which leads to lots of people developing on Linux and MacOsX (as MacOsX is POSIX-compliant).

Windows never wanted none of this, it wanted to be easy and different from other platforms, so as to lock in users. Which means, if you want a Windows PsiPred, you will have to bite the bullet and port it yourself.

ADD REPLYlink written 4 months ago by h.mon22k
1

MacOsX

it is now macOS :-)

ADD REPLYlink written 4 months ago by RamRS19k
1

Thank you for the detailed explanation, that was really convincing ... I guess they succeeded in locking users -like me- in for nearly 10 years, but it's time to move to Linux and open source OS community as I LOVE Bioinformatics and pursuing my PhD in this interesting field. So, if I have to bite the bullet I must concentrate my efforts on conducting useful and meaningful researches instead of reinventing the wheel. Goodbye windows 👋

ADD REPLYlink modified 4 months ago • written 4 months ago by Bara'a220

Welcome to the world of Linux. It is impressively powerful, but the cliche stands - with great power comes great responsibility.

ADD REPLYlink written 4 months ago by RamRS19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1755 users visited in the last hour