Question: Blastn - unexpected behaviour
gravatar for bvm
11 months ago by
bvm0 wrote:

I ran into an unexpected feature of blastn. After extracting some gene sequences from a genome, creating a blast database and blasting back to the reference, lot of extracted genes are not found in the blast result, while they are certainly there in the genome (as they were extracted from there) What can be the cause? Some details (I cannot upload the whole files):

My command is:

blastn -query GCF_000005845.2_ASM584v2_genomic.fna -db MG1655_genes -outfmt 6

The fasta file is downloaded from

The database is gained from extracting the feature table belonging to the assembly above.

A missing gene from the blast is e.g. aaaD. However, if blasting only this gene, it is naturally found.

blastn • 283 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by bvm0

Think you should blast the genes against the genome. So you index the genome first and then blast your genes in fasta format against it. This makes more sense and you can get the location of the gene.

ADD REPLYlink written 11 months ago by gb1.8k

Using a genome as search query against a list of genes is probably not a great idea (unless you have a specific reason for it). Have you considered doing the search in reverse?

Also trying adding -task blastn to your command line to see if it makes a difference. Default is megablast.

ADD REPLYlink written 11 months ago by genomax85k

I thought of this approach because the goal is to find the same genes in a lot of genomes, but you're right to do it in the other way - I'll use the set of genes as query and the specific genomes as subjects.

If adding -task blastn there was a difference, but still not all genes occurred.

ADD REPLYlink written 11 months ago by bvm0
gravatar for bvm
11 months ago by
bvm0 wrote:

After doing some research, I found the answer for my question. The value for max_target_seqs is 500 by default. If raising max_target_seqs to some irrationally high value, all genes are shown.

Hence I used

blastn -query GCF_000005845.2_ASM584v2_genomic.fna -db MG1655_genes -outfmt 6 -max_target_seqs 100000000

to obtain all genes.

ADD COMMENTlink written 11 months ago by bvm0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour