Question: how to locate a gene sequence among fastq files containing short reads
gravatar for jerrybug109
4.3 years ago by
United States
jerrybug10910 wrote:


I've got a dozen different strains of bacteria for which we've sequenced the whole genomes of (we have paired end reads - forward and reverse - for each strain). I wish to find and locate a specific house keeping gene in each strain.

Could I convert the fastq files into fasta files, set up a blast database containing the fasta short read files and then blast the query gene sequence against those? Or would I need to assemble each genome first and then make a database out of the assemblies and then blast the query gene sequence against those?

Would appreciate your input, thanks :-)

genome blast bioinformatics ncbi • 1.7k views
ADD COMMENTlink modified 4.3 years ago by piet1.8k • written 4.3 years ago by jerrybug10910

Don't do any of that .. yet. Make a "genome" with the gene(s) (if known or choose examples from related strains) you need and then align with BBMap. Depending on how similar "different" strains in your pool are there is some risk that reads may multimap. It sounds like you are just looking to see if a specific gene is there so go ahead and use option ambig=all with BBMap to allow reads to multi-map at all possible locations.

You could also try using BBSplit to bin the reads if you have the reference genomes for these strains.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by genomax89k
gravatar for piet
4.3 years ago by
planet earth
piet1.8k wrote:

Using blast that way is very inefficient. If you are really impatient to see quick results and if you already have a sequence of the house keeping gene from the same species, than you may take this sequence as reference sequence and map all your reads on it with 'bwa mem'. If you can afford to wait about 5 minutes longer, you should assemble your reads with SPades.

After assembly, there is also no need to blast. It is much easier to map the contigs to the sequence of the house keeping gene with 'bwa mem'. You can even fed the contigs from all of your isolates into 'bwa mem' in a single run and you will get a nice little BAM file showing a multi sequence alignment of all the isolates comprising the house keeping gene. However, if it is really a house keeping gene, than it will be present in all of the isolates.

ADD COMMENTlink written 4.3 years ago by piet1.8k

That's a really good idea!

ADD REPLYlink written 4.3 years ago by pjmaguire380
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1266 users visited in the last hour