Question: Standalone BLAST with kingdom classification
0
gravatar for GeJu
4.9 years ago by
GeJu0
Germany
GeJu0 wrote:

Hi all,

I have an assembled transcriptome from an organism hosting various symbionts and I'm now interested in how many of my contigs hitting to the different kingdoms. What I've found so far is the -window_masker_taxid that filters my dataset based upon what kind of taxis I put there.

​Is there a possibility to do it all in one? To tell blast to add to every hit to which kingdom it belongs and to have then the possibility to extract e.g. all metazoan hits? 

And what about sequences that return ambiguous hits e.g. hitting against metazoa and fungi? How do I deal with that ones?

 

 

blast • 2.8k views
ADD COMMENTlink modified 4.9 years ago by 5heikki8.4k • written 4.9 years ago by GeJu0
2
gravatar for Neilfws
4.9 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

You can add the taxon ID to tab-delimited BLAST+ output using the -outfmt option, like this:

-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids"

You could then use EUtils/efetch to fetch taxonomic data for each taxon ID in XML format and parse it for the kingdom. I recently did this in Ruby, it looks something like this:

require 'bio'
require 'nokogiri' # XML parsing

Bio::NCBI.default_email = "me@me.com"
xml = Bio::NCBI::REST::EFetch.taxonomy(9606, "xml") # taxon ID 9606 = H. sapiens
doc     = Nokogiri::XML(xml)
names   = doc.xpath("//Taxon/ScientificName").children.map { |child| child.inner_text }
kingdom = names[2]
puts kingdom
# => "Eukaryota"

but you could use whatever implementation of EUtils you like e.g. EDirect.

ADD COMMENTlink written 4.9 years ago by Neilfws48k

cool, I never noticed those custom formats.

ADD REPLYlink written 4.9 years ago by Pierre Lindenbaum120k

Thanks for your suggestions.

But I already stumbled over one problem. I added the taxon ID to the output, tried it on a tiny fraction of my file and only got N/A in the taxon ID column although the sequences have clear hits. 

ADD REPLYlink written 4.9 years ago by GeJu0
1
gravatar for 5heikki
4.9 years ago by
5heikki8.4k
Finland
5heikki8.4k wrote:
  1. Get the taxdb and uncompress it where .ncbirc BLASTDB points
  2. Use -outfmt '6 std sskingdoms'
  3. Sort for best hits
  4. Time to grep
  5. You now assigned kingdom level taxonomy based on best-hit criteria
  6. One alternative way would be to feed you blast output to blast2lca
ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by 5heikki8.4k
0
gravatar for Pierre Lindenbaum
4.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

I wrote a recursive XSLT stylesheet reading a BLAST XML format and appending the NCBI taxonomy:

https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ncbi/blasttaxonomy.xsl

 xsltproc --novalid blasttaxonomy.xsl blastp.xml

 

Something like:

  <Hit_num>1</Hit_num>
  <Hit_id>gi|302699245|ref|NP_001181875.1|</Hit_id>
  <Hit_def>eukaryotic translation initiation factor 4 gamma 1 isoform 6 [Homo sapiens] &gt;gi|302699247|ref|NP_001181876.1| eukaryotic translation initiation factor 4 gamma 1 isoform 6 [Homo sapiens]</Hit_def>
  <Hit_accession>NP_001181875</Hit_accession>
  <Hit_len>1606</Hit_len>

 

 

is replaced by:

          <Hit_num>1</Hit_num>
          <Hit_id>gi|302699245|ref|NP_001181875.1|</Hit_id>
          <Taxon>
            <TaxId>9606</TaxId>
            <ScientificName>Homo sapiens</ScientificName>
            <OtherNames>
              <GenbankCommonName>human</GenbankCommonName>
              <CommonName>man</CommonName>
              <Name>
                <ClassCDE>authority</ClassCDE>
                <DispName>Homo sapiens Linnaeus, 1758</DispName>
              </Name>
            </OtherNames>
            <ParentTaxId>9605</ParentTaxId>
            <Rank>species</Rank>
            <Division>Primates</Division>
            <GeneticCode>
              <GCId>1</GCId>
              <GCName>Standard</GCName>
            </GeneticCode>
            <MitoGeneticCode>
              <MGCId>2</MGCId>
              <MGCName>Vertebrate Mitochondrial</MGCName>

(...)

 

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Pierre Lindenbaum120k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1544 users visited in the last hour