Downloading from nr data base.
2
0
Entering edit mode
6.9 years ago

Dear all

I would like to download the genes information for the streptomyces genus from the nr database. I tried to do it but I downloaded all the database. Could someone explain me how to filter out the data to only get the information for streptomyces?

Thanks

Carlos

gene • 1.4k views
ADD COMMENT
1
Entering edit mode
6.9 years ago
Jake Warner ▴ 830

You could try using blastdbcmd:

blastdbcmd -db /scratch/db/nr/blastDB/nr -dbtype prot -entry all -outfmt "%g %T" | \
awk ' { if ($2 == 1883) { print $1 } } ' | \
blastdbcmd -db /scratch/db/nr/blastDB/nr -dbtype prot -entry_batch - -out streptomyces.txt

I think 1883 is the taxid for all streptomyces...

ADD COMMENT
0
Entering edit mode
6.9 years ago

Just use a tool that indexes and parses the FASTA format, and allows for a case-insensitive regular expression match:

$ pip install pyfaidx
$ faidx nr.fasta --regex "(?i)streptomyces"
ADD COMMENT

Login before adding your answer.

Traffic: 2426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6