Define sequences length when doing blastx
1
0
Entering edit mode
6.5 years ago
horsedog ▴ 60

Hi, I'd like to use my nucleotide sequences to do the blastx to get protein sequences. Here is my command:

blastx -db refseq_protein -query outputA.fasta -out blastx.fasta -evalue 0.001

And I got more than 40000 results, which is too much, so I want to do some cut-off by defining length, for example ,I only want those protein sequences from 300bp to 500bp, is it possible to do that? How I should modify this command?

Many thanks

blast • 1.2k views
ADD COMMENT
0
Entering edit mode
6.5 years ago
Michael 54k

Firstly, you should set your e-value cutoff to something more stringent: 1e-6 to 1e-10 are commonly used. 1e-3 includes very poor alignments. This will most likely already solve your problem.

I only want those protein sequences from 300bp to 500bp, is it possible to do that?

You mean in the database, or the query? You need a very good reason to set a hard cut-off on length like that, which I doubt will be a valid reason given insertions and deletions, fusion proteins and the like. We need a better description of your task to help you further.

ADD COMMENT
0
Entering edit mode

Thanks for reply Michael, Yes you're correct, I thought about that before but since here I'd like to find as many species as possible, So I want to start from low value first and see how's going and then search more stringently. Here I set this cut-off from 300 to 500 is because I found this protein in literature is around 400, so that's why, but yes of course we can set from 350-450 something like this, but I'm not sure if I would filter those that are interesting. Here the 300 -500 means the "Query cover" in blast, I don't if this is more clear to you.

ADD REPLY
0
Entering edit mode

It is still not clear to me what you are trying to accomplish here. Is this metagenomics?

ADD REPLY
0
Entering edit mode

Not yet , now I'm studying evolution of a new gene in bacteria. I'm trying to find out the distribution of this gene in bacteria, the first step is to get the refseq nucleotide sequences of this gene, but only 58 genomes were found , I've done this, so now I'm trying to do blast in protein database to see if I can find this gene in other species.

ADD REPLY

Login before adding your answer.

Traffic: 1413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6