Question: Blastn - unexpected behaviour
1
gravatar for bvm
6 days ago by
bvm0
bvm0 wrote:

I ran into an unexpected feature of blastn. After extracting some gene sequences from a genome, creating a blast database and blasting back to the reference, lot of extracted genes are not found in the blast result, while they are certainly there in the genome (as they were extracted from there) What can be the cause? Some details (I cannot upload the whole files):

My command is:

blastn -query GCF_000005845.2_ASM584v2_genomic.fna -db MG1655_genes -outfmt 6

The fasta file is downloaded from https://www.ncbi.nlm.nih.gov/genome/167?genome_assembly_id=161521.

The database is gained from extracting the feature table belonging to the assembly above.

A missing gene from the blast is e.g. aaaD. However, if blasting only this gene, it is naturally found.

blastn • 86 views
ADD COMMENTlink modified 6 days ago • written 6 days ago by bvm0
1

Think you should blast the genes against the genome. So you index the genome first and then blast your genes in fasta format against it. This makes more sense and you can get the location of the gene.

ADD REPLYlink written 6 days ago by gb830
1

Using a genome as search query against a list of genes is probably not a great idea (unless you have a specific reason for it). Have you considered doing the search in reverse?

Also trying adding -task blastn to your command line to see if it makes a difference. Default is megablast.

ADD REPLYlink written 6 days ago by genomax70k

I thought of this approach because the goal is to find the same genes in a lot of genomes, but you're right to do it in the other way - I'll use the set of genes as query and the specific genomes as subjects.

If adding -task blastn there was a difference, but still not all genes occurred.

ADD REPLYlink written 6 days ago by bvm0
2
gravatar for bvm
6 days ago by
bvm0
bvm0 wrote:

After doing some research, I found the answer for my question. The value for max_target_seqs is 500 by default. If raising max_target_seqs to some irrationally high value, all genes are shown.

Hence I used

blastn -query GCF_000005845.2_ASM584v2_genomic.fna -db MG1655_genes -outfmt 6 -max_target_seqs 100000000

to obtain all genes.

ADD COMMENTlink written 6 days ago by bvm0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1202 users visited in the last hour