Get species from Blast or anywhere with Biopython
0
0
Entering edit mode
7.5 years ago
dshulgin ▴ 260

Hi, everyone. I'm trying to filter out my transcriptome reads the following way:

-reads, that might belong to viruses i want to keep saved.

-reads, that might be from plants i want to remove from my list.

First i try to align every read with blastn with Biopython package. It gives me http://biopython.org/DIST/docs/api/Bio.Blast.Record.Alignment-class.html this objects as part of return. It contains title like "Vitis vinifera clone SS0AEB13YG07" but how can i get what it is? For example this one is plant (i've googled it) but where i can get, that this alignment belongs to plants programmatically?

Thanks.

genome blast • 1.7k views
ADD COMMENT
1
Entering edit mode

There are better ways of doing this. One would be to use BBsplit from BBMap suite with viral genomes file from NCBI (or just the virus(s) you are interested in). That would allow you to bin the reads as virus (known) and others. Presumably there is only one other plant involved? If you know what it is then you can throw that genome in the BBSplit run and gather reads for that genome at the same time.

If you wish to pursue the solution you have thought of, then you can get the names of all plants from NCBI taxonomy FTP site. Here is one thread to give you a starting point: Download whole dataset from NCBI Taxonomy

Blast indexes from NCBI used to include taxonomy information (I am not sure they do any more but you can check). In that case you can include relevant fields in your blast result by using one of the tabular formats and parse the results you need that way.

ADD REPLY
0
Entering edit mode

I see, just thought we have more suitable way to do that like i can give "scientific name" and some API can give me taxon. Thanks for your help.

ADD REPLY
0
Entering edit mode

There is more than one way of doing this and someone may be along with a suggestion in that direction (e.g. eutils).

That said blast is a resource intensive way of dealing with NGS data. You would get through your dataset much faster with BBSplit.

ADD REPLY
0
Entering edit mode

Unfortunately we have no ref genome.

ADD REPLY

Login before adding your answer.

Traffic: 2092 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6