Aligning to ultra-short reference sequence
1
0
Entering edit mode
5.9 years ago
spaladug ▴ 10

I have set of subject sequences (25 to 40 bps long). My query sequences are 200-300 bp long. I am trying to find the location of short subject sequences in long query sequences. This is the exact inverse of the alignment problem, where we have long reference and we have short reads that we align to that long reference sequence. I have tried creating a blast db using the short subject sequences (25 to 40 bp) and tried to query that blast db using the long (200-300 bp) sequences, I am getting good results, but the sensitivity is not that great. I have tried bowtie2 and bwa-mem, but the results are even worse. Does anyone know how to solve this problem other than doing global-local alignment against every subject sequence for every query sequence. Any help is appreciated.

alignment • 1.6k views
ADD COMMENT
1
Entering edit mode
5.9 years ago
d-cameron ★ 2.9k

I am getting good results, but the sensitivity is not that great.

How do you know this? How dissimilar do your sequences need to be before you consider a hit to be a false positive?

I have set of subject sequences (25 to 40 bps long). My query sequences are 200-300 bp long. I have tried bowtie2 and bwa-mem, but the results are even worse.

You can swap your query and subjects and align 25-40bp sequences against a 'reference' of 200-300bp sequences. Short read aligners have generally have better performance if you align the shorter sequence to the longer sequences as they penalise partial matches. You'll need to explicitly enable multi-mapping alignment as you definitely want to report all alignment positions of your 25-40bp sequences.

ADD COMMENT

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6