Hi biostars!
I am using the blastn on the command line to remotely blast against the refseq_rna database but I would like to do this only for mammals. I have 150,000 sequences and would like to limit the search to mammals to avoid exceeding the CPU limit. Is there a way to do this? I know one can download the GIs for a specific taxon but I want to do this remotely and for all mammals, not just a specific species.
This is what I have been working with: blastn -db refseq_rna -query sequences.fsa -out blastn_sequences.out -remote -word_size 11 -gapopen 5 -gapextend 2 -penalty -3 -reward 2 -evalue 0.00001 -num_descriptions 3 -num_alignments 3
Any help is greatly appreciated!
Thanks h.mon! Any chance you can tell me how to download the mammalian databases so I don't have to use -remote ?
It says I can download databases from ftp://ftp.ncbi.nlm.nih.gov/blast/db/ or retrieve them automatically with update_blastdb.pl but I don't know how to specify only the mammalian ones. Again, thanks for the help!
Follow the instructions (2 or 3, but not 1) from How can I download a list of IDs for all sequences from a specific organism or taxonomic group?.