Question: data from SRA against blast databases method.
gravatar for arronar
4.9 years ago by
arronar160 wrote:


I want to find a specific gene inside SRA files and I am wondering if i am using a correct way to align those sequences because its the first time that i am "playing" with something like that.

So, here is what i've done till now.

Download .sra files from NCBI and using fastq-dump program , convert .sra files into .fastq using :

fastq --split-spot 

Then those .fastq files convert them into .fasta using :

awk 'NR % 4 == 1 || NR % 4 == 2' myfile.fastq | sed -e 's/@/>/' > myfile.fasta

Finally, convert .fasta into blast readable database using :

makeblastdb -in myfile.fasta -dbtype nucl -out myNewdb

Now having ready the Database i can run a blast using :

blastn -query query.fa -db myNewdb -task blastn -dust no -outfmt 7 -num_alignments 2 -num_descriptions 2 nucl, prot

So, is this approach right/correct? Or i have to try it somehow different ?

Thanks in advance.

blast alignment next-gen • 2.9k views
ADD COMMENTlink modified 4.9 years ago by piet1.6k • written 4.9 years ago by arronar160
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

That's not a great idea, you'll likely get a lot of false hits (not to mention that it seems a bit silly to blast a whole gene against a bunch of short reads). The normal process would be to extract the reads in fastq format from the SRA file and then map those to the genome containing the gene you're interested in (not that I can think of a reason to look for a single gene inside a HTS experiment, unless you're interested in variants only in it and there's no VCF file available).

ADD COMMENTlink written 4.9 years ago by Devon Ryan89k
gravatar for mikhail.shugay
4.9 years ago by
Czech Republic, Brno, CEITEC
mikhail.shugay3.3k wrote:

Have you tried the SRA-BLAST tool?

PS If you're going to do blast search directly, you should consider filtering redundant sequences from fastq and place their counts to header. Or use this family of tools: For high over-sequencing this could speed-up a lot

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by mikhail.shugay3.3k
gravatar for piet
4.9 years ago by
planet earth
piet1.6k wrote:

In principal, your approach is right/correct, but you will need a powerfull filter to deal with the blast output and to make sense of it efficiently. How long are your query sequences? For short queries the size of a single gene I would convert the blast output into a multi sequence alignment and inspect that multi sequence alignment with a viewer like clustalx. Unfortunately I am not aware of a common tool for doing such a conversion. But you can get nearly the same result if you use bwa instead of blast and tablet as your multi sequence viewer.

Let's say, you have fully sequenced clinical bacterial isolate and you want to find out, if this isolate is carrying the antibiotic resistance gene oxa40. This gene is located on a mobil genomic element and only present in some isolates. You just download an arbitrary oxa40 sequence from Genbank (size is about 900 nt) and map all your reads to this sequence with bwa. Then you convert the output from SAM to BAM and view it with tablet. In tablet you will immediately recognize, if the oxa40 gene is present at all and also if your reads show any significant SNP with respect to the query sequence.

Downside is that bwa is much slower than blast, since blast uses an indexed database of your reads while bwa cycles through all your reads sequentially. But for bacterial genomes bwa is presumably fast enough to be useful. I have been wondering for some while if there exists a tool to convert blast output to SAM format. This should be doable in principle, but you have to cope with all the hairy details of the SAM format.

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by piet1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1057 users visited in the last hour