Beginner Blasting Short Sequences
2
2
Entering edit mode
12.0 years ago
Richard ▴ 590

Hi All, I have a small db of abut 100 sequencing shorter than 100bp.

I have some really short queries (5-30bp).

When I run blastn with -query myquery -subject mysubject I don't get any matches. I tried reducing the word size using the -word-size option, but still didn't get any matches even though I know they are there.

I know blast is overpowered for my little task, but I am using blast in my program for some other tasks, so it would be easiest if I could just finish it off without using another program.

Also I am using biopython for parsing the blast xml output, so if other program are suggested, please recommend something with a python parser.

thanks!

blastn • 2.5k views
ADD COMMENT
0
Entering edit mode

It's not so much that BLAST is "overpowered" in this case, more "inappropriate". Just because you can force it to do short query alignment doesn't mean that's a good idea. Suggest you investigate other alignment tools.

ADD REPLY
2
Entering edit mode
12.0 years ago
Michael 54k

Use Ssearch36 (Smith-Waterman exact local laignments, part of FASTA tools) to get all possible matches. It is feasible with your small dataset. With the -BB option it can mimic blast text output (not XML afaik) quite well, should be good enough to use a blast parser in Bio* libraries to parse this, it worked with BioPerl at least, or you could use the FASTA-output format parser with the standard output format of ssearch. Use of any short-read/heuristic aligner is not an good option in your case, there is no reason to sacrifice accuracy for improving performance if the number of sequences is very small.

ADD COMMENT
1
Entering edit mode

You could also use "water" from the EMBOSS package (another Smith-Waterman algorithm implementation). Totally agree on avoiding the heuristics for data sets of this size.

ADD REPLY
0
Entering edit mode

This seems to be the right way to go. BLAST is in fact inappropriate due to its heuristics and the small sequence lengths. I haven't used FASTA tools but EMBOSS programs are well documented and so water should be good for beginners.

ADD REPLY
0
Entering edit mode
12.0 years ago
Lee Katz ★ 3.1k

I guess if they are so short then you can try to just use python to detect a 100% match in your database, ie a substring in the database equal to your query. Another option is to use a short read mapper like bowtie2, SHRiMP, or BWA and then parse the output sam or bam files.

ADD COMMENT

Login before adding your answer.

Traffic: 3001 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6