Dear all, I have downloaded a number of SRA genomic seqeunce datasets. I am wondering if it's feasible to perform BLAST againt these datasets using a number of gene seqeunce. This analysis will identify my aim genes' homologous short reads, which can then be further assembled (which softwares can be used? this is another question of mine). These SRA data, which is huge, is obtained by illumina sequencing. I worry SRA reads are too short for BLAST. THANKS!
Blast works fine for shorter sequences if you tune the search, for example reduce the word size to minimum
-W 7 and depending on the database size you will need to raise the expectation value (also search for more advice on this, like primer search with blast).
Note however that the computational resources required to perform the alignments will be many orders of magnitude higher than that of running a short read aligner.
Run some test and evaluate the runtimes and see if you have the computational capacity to perform the searches.
You can definitely run BLAST against SRA data. If you want to use the official BLAST tools and databases, you can download them from here
The online BLAST website automatically uses different parameters for short sequences, e.g. for sequences less than 50 or so bases it uses the "blastn-short" program instead of "blastn" to do the alignment. You could calculate the sequence length of your SRA files (e.g. manually, or by using something like FastQC) and then decide whether to run "blastn", "blastn-short" or one of the other BLAST programs
If you have a small amount of data, e.g. a few hundred sequences, you can use the BLAST programs in a "remote" mode, where they access the online databases rather than needing to download the databases to your local machine. But if you have a large amount of data to align, it would probably be better to download the BLAST databases to your local machine (they are about 400GB uncompressed for the main databases)
You can also use the BLAST programs to do alignments against your own databases rather than using the official BLAST databases, but you need to format them in a special way before BLAST can use them
Be aware that BLAST runs extremely slowly, especially blastn and blastn-short which are the most accurate versions. Also, if you are downloading and compiling the tools yourself, the multi-threaded mode might not work (it did not work for me). If you have a multi-core computer, you might find it useful to run the BLAST programs in parallel on different sequences, e.g. write a small script to do this, or use GNU Parallel or a similar tool. You could also try running BLAST on several different computers in parallel. I did some BLASTing recently using about 400 computers in parallel, because BLAST was running so slowly and would have taken years to complete otherwise.
If you don't need to use BLAST specifically, but just need to do alignment, then you should consider using a faster alignment tool like Bowtie2 (as others have said)
PS. For assembly you could try Trinity