Question: local blastn against one subgroup of species
0
gravatar for jf
2.6 years ago by
jf0
jf0 wrote:

Hello, thanks to the great resource of this forum I was able to blastn a .fa file against a local blast database. The issue I am having is that the pipeline is blasting against every available sequence in the database. Even with the option: "-qcov_hsp_perc 0.95" I am getting good coverage but species like bacteria and birds are still be queried against.

Going through the ncbi genome site I see that species are broken down into "Kingdom: Eukaryota; Subgroup: Fishes", this in mind I would like to limit my queries to only a specific subgroup or subgroups and not the entire database.

I did not see anything like this in the blastn options yet I am sure there is a way to limit the number of groups a file is blasted against.

Please let me know your ideas so I can give them a try.

blastn • 1.0k views
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 2.6 years ago by jf0
1

Rebuild the db only for subset of your interest. If this is not feasible just restrict blast search using -gilist by providing gi ids of subset sequence. Refer to the manual on detailed explanation

ADD REPLYlink written 2.6 years ago by Prasad1.6k

Restricting the blast search using a list of ids is a good way to go. I suggest using -seqidlist instead of -gilist, since ncbi is not going to maintain G.I. as primary identifiers for sequence records. Ps : seqidlist is a list of accession.version.

How to generate the needed seqidlist list is unclear yet (you need to define your search criterion)

ADD REPLYlink written 2.6 years ago by erwan.scaon790

Interesting idea, if I run "blastn -seqidlist" I get this error: "Error: Argument "-seqidlist". Value is missing". I need to add a file path after the option. My question is do I need the file to include "NC_" or "NG_"?, I think the available literature supports the idea of running -gilist instead because that method seems to be a bit more established.

I would follow this command line:

Query a BLAST database with a GI, but exclude that GI from the results Extract a GI from the ecoli database: $ blastdbcmd -entry all -db ecoli -dbtype nucl -outfmt %g | head -1 | \ tee exclude_me 1786181 Run the restricted database search, which shows there are no self-hits: $ blastn -db ecoli -negative_gilist exclude_me -show_gis -num_alignments 0 \ -query exclude_me | grep cat exclude_me Query= gi|1786181|gb|AE000111.1|AE000111

I am just having a hard time isolating a list of gi's that I can screen.

Is anyone familiar with a source that can specify species and the corresponding gi so I can create an appropriate inclusion of exclusion file? I did not find anything in the manual or appendices.

ADD REPLYlink written 2.6 years ago by jf0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour