Question

Running blastp with BLAST+ 2.15.0 against custom database; need to identify hits

0

Entering edit mode

6 months ago

rebecca.calvo • 0

Note: I am very new to bioinformatics!

I am on a Windows 11 machine using BLAST+ 2.15.0 to run blastp queries against a custom database of shotgun metagenomic data from this website: http://gigadb.org/dataset/100842

I am querying the 02_AnaerobicDigestion_GeneCatalog_gene.pep.fa file using blastp, and the results returned (to a .txt or .xml file) look like this:

blastp hits from metagenomic database

I want to know what bacterial strain/species is associated with each hit, but all the subjects have an AD_gene_#### identifier (from the metagenome sequencing) instead of any kind of species/strain identifier.

I know that I should be able to collect protein sequences from the blastp results into a file, but I do not know how to do this.

I would then need to blastp these sequences against the non-redundant protein database and write a file that contains information about the taxonomy of the the top blastp hit.

I don't need the amino acid sequence at that point, but just some kind of strain identifier that I can use to create a list of bacterial "species."

In summary, I want a list of bacterial species that contain a homolog of a protein of interest from a shotgun metagenome dataset.

I'm not sure how to get the output that I'm looking for and would appreciate any help!

shotgun metagenomics blastp taxonomy • 487 views

ADD COMMENT • link updated 6 months ago by GenoMax 147k • written 6 months ago by rebecca.calvo • 0

score 2 · Accepted Answer · 2024-04-19

I know that I should be able to collect protein sequences from the blastp results into a file, but I do not know how to do this.

You can do that by extracting the sequences you need from the custom database using blastdbcmd utility included in blast+. See help: https://www.ncbi.nlm.nih.gov/books/NBK569853/

As for the rest of the analysis it would be better if you use an easily parsable format for blast output, when you do the blast against nr. Look into -outfmt 6 for this purpose. https://www.metagenomics.wiki/tools/blast/blastn-output-format-6