Question: SIFT ncbi executables setup incorrectly?
0
gravatar for arronslacey
4.4 years ago by
arronslacey240
United Kingdom
arronslacey240 wrote:

 

EDIT: solved. whilst I had properly defined the path to the NCBI executables in 

SIFT_for_submitting_fasta_seq.csh

I had not properly set the path in 

seqs_chosen_via_median_info.csh

 

It worked after I modified this second file. As I mention in comments below, just because the test files work fine, it doesn't mean your SIFT has properly been configured. When you read the output of the test files, all incorrect path/database configurations are ignored because the required database/protein alignment files are provided within the SIFT directory. I find this a bit misleading, but there we go. something to consider.

 

**ORIGINAL QUESTION**

I cannot determine what is causing this however.

 

I am successfully using standalone SIFT. I can run SIFT using the test files provided:

    $ csh bin/SIFT_for_submitting_fasta_seq.csh test/lacI.fasta db/uniref.fa test/lacI.subst
    
    tail is lacI.fasta
    query is /home/arron/Phd/programs/sift5.2.1/tmp/lacI.fasta.query
    /usr/share/ncbi-blast+/bin//bin//psiblast: Command not found.
    exiting because stauts not equal to 0
    tell me i've entered
    info_on_seqs
    *** The following sequences have been removed because they  were found to be over 100% identical with your protein query: *** The following sequences have been removed because they  were found to be over 100% identical with your protein query: QUERY has 100 identity
    UniRef90_P03023Lacto has 100 identity
    UniRef90_A8AKB7Putat has 81 identity
    UniRef90_C1M7F8Lacre has 81 identity
    UniRef90_D2TK52Lacto has 78 identity
    UniRef90_E8XR69Trans has 59 identity
    UniRef90_D2C396Trans has 55 identity
    UniRef90_A9MQ83Putat has 76 identity
    UniRef90_C6DD30Trans has 53 identity
    UniRef90_E0SHJ3Trans has 56 identity
    UniRef90_D2ZG46Ribos has 54 identity
    UniRef90_C6CD23Trans has 51 identity
    UniRef90_A4W7D1Trans has 54 identity
    UniRef90_E8XW29Trans has 46 identity
    UniRef90_A1RAY2Trans has 38 identity
    UniRef90_D6DVN0Trans has 53 identity
    UniRef90_C4S4I8Lacto has 52 identity
    UniRef90_C9XUV2Lacto has 45 identity
    UniRef90_E3G6U0Trans has 54 identity
    UniRef90_C4UWB0Lacto has 52 identity
    UniRef90_C4UNI7Lacto has 48 identity
    UniRef90_D4GHV0LacIn has 50 identity
    UniRef90_C4TZY6Lacto has 50 identity
    UniRef90_D7CXK6Trans has 36 identity
    UniRef90_F0KTF2Lacre has 50 identity
    UniRef90_E1SIH9HTH-t has 50 identity
    UniRef90_D4E812LacIf has 44 identity
    UniRef90_D6YQA5HTH-t has 42 identity
    UniRef90_D5CE50Sugar has 43 identity
    UniRef90_C6CQF2Trans has 43 identity
    UniRef90_C4X4N1Trans has 43 identity
    UniRef90_A4TIF9Trans has 44 identity
    UniRef90_C4T764Lacto has 52 identity
    UniRef90_D1RRC7Trans has 41 identity
    UniRef90_E6WEY3Trans has 49 identity
    UniRef90_D8MUA6Lacto has 51 identity
    UniRef90_C9XWJ1Lacto has 45 identity
    UniRef90_D5CJ02Lacre has 54 identity
    UniRef90_D1RT73Trans has 43 identity
     UniRef90_P03023Lacto, UniRef90_P03023Lacto,.
    
    .<BR><BR>
    before seg fault?
    9
    10
    13
    14
    15
    16
    18
    19
    20
    21
    22
    23
    25
    30
    34
    45
    50
    53
    56
    65
    76
    127
    166
    179
    187
    188
    197
    201
    205
    218
    220
    241
    247
    249
    250
    252
    256
    272
    274
    284
    286
    288
    326
    356
    357
    358
    359
    360
    filename is /home/arron/Phd/programs/sift5.2.1/blimps/docs/default.diri
    about to make predictions
    not including UniRef90_C4T764Lacto with X at 1
    not including UniRef90_C4T764Lacto with X at 2
    not including UniRef90_C4T764Lacto with X at 14
    done checking all subst
    trying to free things here
    unalias: rm not found
    Output in /home/arron/Phd/programs/sift5.2.1/tmp/lacI.SIFTprediction

and produces a SIFT prediction file as expected.

However, when I try this with one of my own proteins of interest, the SIFT prediction file is not created.

    $ csh bin/SIFT_for_submitting_fasta_seq.csh test/NP_000162.2.fasta db/uniref.fa test/glra1.subst 
    tail is NP_000162.2.fasta
    query is /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.fasta.query
    /usr/share/ncbi-blast+/bin//bin//psiblast: Command not found.
    exiting because stauts not equal to 0
    tell me i've entered
    info_on_seqs
    cannot open file /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.alignedfasta 
    Output in /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.SIFTprediction


The clue here is in:

    cannot open file /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.alignedfasta

where it appears an alignment via psiblast could not be made. I cannot find this file, but it should be produced.

how could this be??


For reference I include my 

--1) test files (fasta and substitution file)

lacI.fasta

    >gi|2506562|sp|P03023|LACI_ECOLI   LACTOSE OPERON REPRESSOR
    MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQ
    SLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVEACKAAVHNLLAQRVS
    GLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFSHEDGTRLGVEHLVALGHQ
    QIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWSAMSGFQQTMQMLNEGIVPT
    AMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSSCYIPPLTTIKQDFRLLGQTS
    VDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNTQTASPRALADSLMQLARQVSRLESGQ

lacI.subst

    K2S  
    P3M


--2) my protein files

    >gi|119372310|ref|NP_000162.2| glycine receptor subunit alpha-1 isoform 2 precursor [Homo sapiens]
    MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCN
    IFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEIT
    TDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGL
    TLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPAR
    VGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRR
    HHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISR
    IGFPMAFLIFNMFYWIIYKIVRREDVHNQ

glra1.subst

    P35R


any advice would be greatly appreciated.

 

 

 

snp blast sift • 1.7k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by arronslacey240
1

It would seem that the PSI BLAST step would produce an aligned.fasta file. I think the lacI.aligned.fasta file might have existed already, so it did not complain. Maybe your actual run had a file for which the program could not find/create the aligned.fasta file, so it quit.

ADD REPLYlink written 4.4 years ago by RamRS20k
Yes it must be using a pre aligned file for the test, and psiblast doesnt need to be run. my psi blast path may be in incorrect for when it is called in the absence of a pre aligned file. I will check
ADD REPLYlink written 4.4 years ago by arronslacey240
1

Hello arronslacey!

It appears that your post has been cross-posted to another site: http://stackoverflow.com/questions/26663738/fasta-file-not-comptible-with-sift

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 4.4 years ago by Pierre Lindenbaum118k

thanks Pierre - duly noted.

ADD REPLYlink written 4.4 years ago by arronslacey240
3
gravatar for smilefreak
4.4 years ago by
smilefreak400
New Zealand
smilefreak400 wrote:
 /usr/share/ncbi-blast+/bin//bin//psiblast: Command not found.
    exiting because status not equal to 0,

I think that may be your problem is that psiblast cannot be run, and then that causes a cascade in your csh script. I also note that you have a /bin//bin in your path to psiblast, do you actually mean it to be that our is it meant to be /bin.

 /usr/share/ncbi-blast+/bin//bin//psiblast

to

/usr/share/ncbi-blast+/bin//psiblast
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by smilefreak400
1
Yes I think this is it, and as Ram pointed out - an aligned file for my test file might have already existed .... hence it did not complain. Thank you!
ADD REPLYlink written 4.4 years ago by arronslacey240

I can't seem to find out why the extra "bin" is being generated.

my .csh files and config_env.txt file explicitly define the path to the ncbi executables as

/usr/share/ncbi-blast+/bin/

where psiblast is found. However something is adding on this extra "bin" which I just can't seem to find the file responsible.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by arronslacey240

Are you able to share a copy of your csh file.?

ADD REPLYlink written 4.4 years ago by smilefreak400
2

Have solved this now - there is another .csh file called "seq_chosen_via_median_info.csh" where the path was incorrect. it works now, but as word of caution to anyone - after trying to re-do the analysis, it was complaining that the blast database wasn't formatted correctly (which I fixed immediately), however using the test files provided ignores any database conflicts just as it ignores path conflicts because the files are already provided to make a SIFT prediction for the test proteins. this is misleading and makes you think your SIFT setup is good to go, when it might not be. I will provide answer at top of my question. thanks for your help

ADD REPLYlink written 4.4 years ago by arronslacey240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour