Question: restricting number of hits for blastn execution format
1
gravatar for vigneshprbh37
4.3 years ago by
INDIA
vigneshprbh3720 wrote:

so iam performing a standalone blast= execution for aligning mus musculus genes

i use the command line

./blastn -query /home/Downloads/esembl/EDL36652.1d.fasta -db /home/Desktop/ncbi-blast-2.2.30+/output2/mart -outfmt 7 -out /home/Desktop/ncbi-blast-2.2.30+/outformat/result

what additional commands do i need to supply to restrict the number of hits to 10

anyone got suggestions

edit

okay so tried the cmd line

./blastn -query /home/Downloads/esembl/EDL36652.1d.fasta -db /home/Desktop/ncbi-blast-2.2.30+/output2/mart -outfmt 7 -max_target_seqs 1 -out /home/Desktop/ncbi-blast-2.2.30+/outformat/result

but i still get 31 hits and i cant go below 1

blast blastn outfmt • 1.6k views
ADD COMMENTlink modified 4.3 years ago by edrezen720 • written 4.3 years ago by vigneshprbh3720

Why not consult the manual?

ADD REPLYlink written 4.3 years ago by 5heikki8.4k

Since you are using outfmt 7, "-max_target_seqs" is the correct option to restrict the number of hits you want to see in the output. Now, is it possible that the multiple "hits" you are seeing are HSPs belonging to a single subject sequence? Can you post few lines from your BLAST output?
 

ADD REPLYlink written 4.3 years ago by Siva1.6k

BLAST OUTPUT FOR THE FIRST GENE OUT OF 4 RUN AS CONCATANATED FILES

# BLASTN 2.2.30+
# Query: gi|74140765|gb|CH466520.2| Mus musculus 232000009837964 genomic scaffold, whole genome shotgun sequence
# Database: /home/CCMB/Desktop/ncbi-blast-2.2.30+/output2/mart
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 34 hits found
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    2256    0    0    76572479    76574734    4671    2416    0.0     4167
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    843    0    0    76565546    76566388    5875    5033    0.0     1557
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    550    0    0    76578394    76578943    1792    1243    0.0     1016
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    263    0    0    76576963    76577225    2210    1948    2e-132      486
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    258    0    0    76541600    76541857    8535    8278    1e-129      477
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    255    0    0    76560486    76560740    6403    6149    5e-128      472
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    99.58    240    1    0    76547987    76548226    7491    7252    5e-118      438
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    99.57    230    0    1    76543803    76544032    8281    8053    6e-112      418
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    99.56    225    1    0    76586814    76587038    741    517    1e-109      411
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    221    0    0    76546177    76546397    7812    7592    4e-109      409
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    212    0    0    76570290    76570501    4881    4670    4e-104      392
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    210    0    0    76575876    76576085    2418    2209    5e-103      388
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    207    0    0    76550481    76550687    7003    6797    2e-101      383
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    205    0    0    76552217    76552421    6800    6596    3e-100      379
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    191    0    0    76560850    76561040    6150    5960    2e-92      353
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    175    0    0    76591234    76591408    284    110    1e-83      324
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    161    0    0    76577639    76577799    1951    1791    8e-76      298
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    98.21    168    2    1    76566984    76567150    5043    4876    4e-74      292
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    149    0    0    76582353    76582501    1192    1044    4e-69      276
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    141    0    0    76548927    76549067    7253    7113    1e-64      261
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    137    0    0    76583694    76583830    961    825    2e-62      254
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    129    0    0    76544858    76544986    8043    7915    5e-58      239
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    127    0    0    76588605    76588731    520    394    7e-57      235
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    115    0    0    76549388    76549502    7116    7002    3e-50      213
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    112    0    0    76592828    76592939    112    1    1e-48      207
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    99.13    115    1    0    76589553    76589667    395    281    1e-48      207
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    99.14    116    0    1    76547665    76547780    7595    7481    1e-48      207
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    105    0    0    76545521    76545625    7917    7813    1e-44      195
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    102    0    0    76556230    76556331    6597    6496    5e-43      189
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    97    0    0    76559253    76559349    6497    6401    3e-40      180
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    91    0    0    76583503    76583593    1046    956    7e-37      169
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    87    0    0    76564390    76564476    5962    5876    1e-34      161
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    83    0    0    76584400    76584482    824    742    2e-32      154
gi|74140765|gb|CH466520.2|    ENSMUSG00000040225|ENSMUST00000028016|proline-rich    100.00    56    0    0    76581415    76581470    1245    1190    2e-17      104
# BLASTN 2.2.30+
# Query: gb|CH466534.1|:20597795-20598187 Mus musculus 232000009844919 genomic scaffold, whole genome shotgun sequence
# Database: /home/CCMB/Desktop/ncbi-blast-2.2.30+/output2/mart
# 0 hits found

ADD REPLYlink written 4.3 years ago by vigneshprbh3720
1

Yes, these are HSPs from one subject sequence (same subject IDs in the second column). I checked the query and subject IDs. You are using a huge query sequence ( 88,119,379 bp) and searching against a database that contains one huge subject sequence (69,831 bp). Since BLAST is a local alignment tool, it is going to find several HSPs between these huge query and subject sequences. I don't know what your goal here is. If you really want only one HSP per subject, set the option "-max_hsps" to 1.

ADD REPLYlink written 4.3 years ago by Siva1.6k

well when we performed the analysis we got over 3000 hits, and we were resolved for selecting the best hits , low E value, high bit value, max alignment and such . so restriting the number of displayed hits to 10 would have been more convenient

we were using outfmt 7 cos it suited our needs

and i tried your suggestion even at max_hsps 1 i get around 239 hits, while max_target_seqs 1 gave around 31 hits or so

ADD REPLYlink written 4.3 years ago by vigneshprbh3720

Did you use both -max_target_seqns and -max_hsps and set them to 1?

Setting -max_target_seqns to 1 will give only 1 subject/hit but several HSPs if they are present.

Setting -max_hsps to 1 will give only 1 HSP per subject but for all subject/hits in the database.

Use them together to get only 1 HSP from 1 hit.

ADD REPLYlink written 4.3 years ago by Siva1.6k
0
gravatar for edrezen
4.3 years ago by
edrezen720
France
edrezen720 wrote:

Hello,

It is possible that -max_target_seqs works only with some output format. Could you try with -outfmt 6 ?

ADD COMMENTlink written 4.3 years ago by edrezen720

no can do. for my purposes using a -outfmt 6 if i believe correctl doesn't display genes with no hits when i use multiple files

-outfmt 7 is more suited

ADD REPLYlink written 4.3 years ago by vigneshprbh3720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1900 users visited in the last hour