Question: How to limit/filter BLAST results using NcbiblastnCommandline
0
gravatar for yarmda
23 months ago by
yarmda0
yarmda0 wrote:

I am using a command like

NcbiblastnCommandline(query="fasta.fasta", db=nt, outfmt=5, out="out.xml", show_gis=True)

And that is returning results.

However, I want to limit the results to only returning the gi number/taxID (not the alignment or the bulk of the data) and to limit the hitlist size (as you would in NCBIWWW) to some arbitrary number. How can these be done?

Ultimately, I am trying to find related species to a given target that aren't the target and download their sequences. Since BLAST can't provide the complete genomes, I want to take their identifiers, so I don't need most of the BLAST output.

ADD COMMENTlink modified 23 months ago by Pierre Lindenbaum122k • written 23 months ago by yarmda0
1

I'm assuming you're running blast in Python, because you have some upstream/downstream code that do other processes. However, it would be much easier to run the blast on the CLI, or use subprocess() in Python to call your blast cmd. Output format 6, tab-delimited, is the easiest IMO to parse. You can set max_target_seqs to a value for each query sequence to return that number of db hits.

ADD REPLYlink written 23 months ago by st.ph.n2.5k
0
gravatar for Pierre Lindenbaum
23 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

However, I want to limit the results to only returning the gi number/taxID

using bioalcidae to extract the hit-it, then call efetch to get the XML for the sequence, and use xpath to extract the taxon id/name.

java -jar dist/bioalcidae.jar -F BLAST -e 'while(iter.hasNext()) {var hit=iter.next(); out.println(hit.getHitId());} ' input.blastn.xml |\
grep '^gi|' | cut -d '|' -f 2 | sort | uniq | while read ID
do
wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${ID}&retmode=xml&rettype=fasta" |\
xmllint --xpath $'concat(TSeqSet/TSeq/TSeq_taxid/text(),"|",TSeqSet/TSeq/TSeq_orgname/text(),"\n")' - 
done | uniq | sort |uniq
ADD COMMENTlink written 23 months ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 777 users visited in the last hour