Question: blastn for short sequences
2
gravatar for User6891
4.6 years ago by
User6891250
Europe
User6891250 wrote:

 

Hi everyone,

I was trying to use blast to map short sequences (between 15 - 30 nucleotides long) to the human reference genome

I have blast locally installed:

these are the options that I used:

blastn -dust no -db ucsc.h19.fasta -outfmt 7 -word_size 7 -evalue 1000 -perc_identity 100 -ungapped -query .. -out ...

I still get way to many matches for my input, while there is in fact only one match in the whole genome. What I see in the output is the good match as the first line, which is off course already good. 

However then there are like 1000 matches following, where the percentage identity is 100, but they are never exactly the same match as my query sequence. Lets say my query sequence is 20 bp long, I only want to get the match where the alignment length is 20 and the q.start = 1 & q.end is 20. And this is only the case for the first line in my output. The rest of the lines never has an alignment length of 20. Is there a way to only output the matches where the alignment length is the same as the length of my query?

I can always filter my output afterwards, but I have a lot of sequences to blast, so I would like to get it as clean as possible from the beginning.

 

 

 

blast • 7.5k views
ADD COMMENTlink modified 4.6 years ago by Siva1.6k • written 4.6 years ago by User6891250
3
gravatar for Siva
4.6 years ago by
Siva1.6k
United States
Siva1.6k wrote:

You can try the -task option and set it to blastn-short  which is optimized for sequences less than 30 nucleotides.

It uses slightly different default values for some options:

-word_size = 7 (which you used)

-reward = 1

-penalty = -3

For controlling the BLAST hits, can you try the option -qcov_hsp_perc and set it to 100?

From blatn -help

-qcov_hsp_perc <Real, 0..100>
   Percent query coverage per hsp

 

 

 

ADD COMMENTlink written 4.6 years ago by Siva1.6k

Thanks this -qcov_hsp_perc <Real, 0..100> might do the trick, but it's only present starting from version 2.2.30 released in october 2014. So I will have to ask for un update on our servers

ADD REPLYlink written 4.6 years ago by User6891250
1
gravatar for Cytosine
4.6 years ago by
Cytosine440
Ljubljana, Slovenia
Cytosine440 wrote:

A good question that I would also like to know the answer for :)

I usually use the -max_target_seqs switch, with a low value (1-3), to reduce the number of hits and parse those out myself. Often the first hit is also the best alignment, though I double check the interesting hits for true positives.

Use with caution :)

ADD COMMENTlink written 4.6 years ago by Cytosine440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 839 users visited in the last hour