Question: which database to blast against genomes.
0
gravatar for Jacob
6 days ago by
Jacob0
Jacob0 wrote:

What I am trying to do is blast about 50 RNA sequences against the genomes of various organisms (All large genomes for vertebrates, other animals). I am trying to do this locally because it takes an enormous amount of time otherwise.

How I am trying to do this is by running update_blast.pl (a script provided in blast+ to locally download databases via ftp).

The problem is this is taking very long and the files that I am downloading are taking up a lot of space. I'm wondering if there is a more sensible way to do this because the databases are taking up about 100 gb on my computer right now.

Heres how I am running it (If I don't adjust the timeout, it usually doesn't finish)

update_blastdb.pl --timeout 800 nt.

I am also downloading

refseq_genomic
nr

And have completed

refseq_rna

I will want to perform these blast searches for multiple organisms, but each individual blast search will be against one organism

On a side note, I've noticed if I need to ^C certain downloads and restart them, they will finish but not otherwise

ADD COMMENTlink modified 6 days ago by h.mon5.7k • written 6 days ago by Jacob0

What I am trying to do is blast about 50 RNA sequences against the genomes of various organisms (All large genomes for vertebrates, other animals)

What is the reason for doing that? What kind of RNA sequences are these? Are you trying to identify what genome those 50 sequences are from or the actual identity of the genes?

ADD REPLYlink written 6 days ago by genomax27k

The genes are all human, I'm trying to determine if there is a significant match for each gene in a number of other organisms as determined by the e value.

ADD REPLYlink written 6 days ago by Jacob0
1

If you want to do this as practice then great but NCBI/EBI has likely done this work for you. You can check NCBI's homologene section to access multiple alignments or alignments available in UCSC genome browser as a track.

ADD REPLYlink modified 6 days ago • written 6 days ago by genomax27k

Thanks, yeah it is basically for practice right now, I'm going to be making adjustments in the future though so I need to do it this way

ADD REPLYlink modified 5 days ago • written 5 days ago by Jacob0

Depending on your goal, there may be faster solutions than BLAST. BBMap's SendSketch tool can taxonomically classify an organism in a few seconds, depending on the genomic size; it can compare your data to nt, RefSeq, and Silva for that purpose. You don't need to download any big files.

As Genomax asked... what are you trying to accomplish? Also, what kind of data do you have?

ADD REPLYlink written 6 days ago by Brian Bushnell11k
0
gravatar for h.mon
6 days ago by
h.mon5.7k
Brazil
h.mon5.7k wrote:

If you have just 50 sequences, blast them online, you can paste or upload a multifasta file to NCBI web blast. It will be quite fast, especially if you leave the search at its defaults - much faster than downloading the databases. Maybe you have 50 transcriptomes instead of 50 sequences?

ADD COMMENTlink written 6 days ago by h.mon5.7k

I need to do it on the command line, because of changes I need to make to my code in future. Do you have any recommendations for databases to blast against? Or how to do it the fastest, I'll keep using these databases over and over so if it takes a day to download thats not a big deal.

ADD REPLYlink modified 5 days ago • written 6 days ago by Jacob0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 969 users visited in the last hour