Local BLAST using command prompt Windows 10
1
0
Entering edit mode
4.1 years ago
viktormpg • 0

Hi everyone

I am new in this topic I am trying to perform a local blast analysis using the cmd of windows 10 (could be not the best but I am not familiarized with Linux and learn to use it from my windows pc could take me a long time so if I can do it in windows that that would save me time)

  • So I started to downloading the blast+ program from NCBI and create the environment variables.
  • I downloaded my protein query in fasta format (I have 50 of them)
  • I downloaded 15 transcriptome codes from another source (because is not available in NCBI). they are in fasta format, each one has 5GB. -After I created my database from my transcriptome fasta format using makeblastdb

    makeblastdb -in sp1.fasta -dbtype nucl -title espone

Now I ran a blastn analysis with one of my proteins querys (fasta format) using one of my databases (named:spone).

blastn.exe -query protein1.fasta  -outfmt 6 -db spone -out test.txt

the result was:

 FASTA-Reader: Ignoring invalid residues at position(s): On line 2: 7, 9, 15, 21, 25-26, 30-33, 36-38, 43, 50, 52-53, 60-61, 64-68
FASTA-Reader: Ignoring invalid residues at position(s): On line 3: 9, 14-15, 19, 22, 25, 33-34, 40, 47, 53-55, 60, 62-63, 65-66, 69
FASTA-Reader: Ignoring invalid residues at position(s): On line 4: 4, 6, 12-15, 26, 31-32, 36, 39-40, 43, 50-51, 56, 65, 67

Here begin my problems:

  • The out result (test.txt) is empty has 0 Kb... Why is happening this?
  • I have 50 proteins (query) and I need to run blast analysis of all of them in the transcriptome of the 15 species so I would like to know if there is some method to run my 50 query in my 15 transcriptome database at the same time.
  • I need that the result shows me the taxonomic names of each specie (15) and the name of proteins (50). For that I downloaded taxdb but I am not sure how this works.
  • If It is possible I would like to know the best form to visualise the data (using a bridge between cmd result and e.g. R).

By the way, I have 16 BG in RAM ITB of SSD just for programs and ITB HD. (I hope that my laptop capacity does not a problem).

I know that is a lot of questions but I tried to find tutorials but I could not find anything precise. Thanks in advance for your help.

alignment • 2.1k views
ADD COMMENT
1
Entering edit mode
4.1 years ago
GenoMax 141k

You can't use a protein query with blastn. You will need to use blastp (protein database) or tblastn (nucleotide database). See more information here.

You will need to use -outfmt 6 if you want to include information about taxonomy.

I would like to know if there is some method to run my 50 query in my 15 transcriptome database at the same time.

There are ways to do this with gnu parallel etc but you may be best off just running the queries in a loop one after the other since you are doing this on a laptop.

ADD COMMENT
0
Entering edit mode

Could you explain to me or give me one example of how can I make the loops, I know how to make them in R but the command prompt in new for me. Thank you.

ADD REPLY
0
Entering edit mode

If your files all end in .fasta then you could do something like this on unix. Since you are on windows you will need to improvise using info here.

for i in *.fasta; 
do
blastp.exe -query ${i} -outfmt 6 (add options you need) -db spone -out ${i}.out; 
done
ADD REPLY

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6