Could anyone suggest me an easiest way to perform pairwise sequence alignment wherein my query sequence is the sequencing reads obtained from NGS and the subject sequence is a constant part expected to be found on the reads. So, i need to have a score and length of alignment as parameters to screen and filter. I would like to execute Smith-Waterman algorithm. I tried using python but it is getting complicated for me to find EMBOSS suite (which I finally downloaded) but then cannot execute it from python (tried WaterCommandline). And I am not even sure that this will give me a score and alignment length if I successfully manage to execute it. I am looking up in R now to see if it is possible and reiterate all the reads against a subject sequence. Then, I would like to splice the query sequence and store it in a file in a fasta format following the sequence alignment. Any ideas on how to conveniently do this will be appreciated.
Question: Pairwise sequence alignment
0
jitendoshi2006 • 0 wrote:
ADD COMMENT
• link
•
modified 4.0 years ago
by
ALchEmiXt • 1.9k
•
written
4.0 years ago by
jitendoshi2006 • 0
1
fishgolden • 450 wrote:
I think the easiest way to perform Smith-Waterman is SSEARCH in FASTA package. Please check. https://github.com/wrpearson/fasta36/releases
1
0
Carlo Yague ♦ 5.7k wrote:
Hi,
I used blast+ for a similar task and it worked fairly well. You'll need to convert the .bam/.sam file to a fasta file first but the results in tabulated format are really easy to filter based on score or alignment length.
Not sure to understand what you mean by "splicing" the query.
Best,
Carlo
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 2245 users visited in the last hour
It is unclear what you are exactly trying to do? Calling SNP? Instead of doing pairwise alignments one option could be to do NGS alignments as usual and then pull the reads out in the region you are interested in followed by converting them to fasta format and then do a multiple sequence alignment (MSA).
Firstly, I am searching for the sequence (subject) that I know is going to be present in the reads. I am interested in the reads that possess the subject sequence. I thought the best way was to do pairwise sequence alignment but I do not find a good platform for for performing these alignments. I also tried bio.pairwise2 in python but it does not align well. I hope i explained it well. What do you mean by NGS alignments? Are there any programs to do so?
Then, I remove the subject sequence part from the reads. Additionally, I would like to align the remaining part of reads to each other so that the redundant reads can be counted as '1' and then align the sorted reads to the genome. I plan to use BLAT/Bowtie for genome alignment
What I meant was alignment with a regular NGS data aligner like bbmap, bwa, bowtie2 etc. These are always going to be very efficient aligning millions of reads rather than blast other similarity search techniques. Once you have the alignment to "genome" (which could be your reduced region of interest though you may get some misalignments) you can extract reads that aligned to the region of interest.
Before you do that it sounds to me like you are expecting duplication/redundancy in this dataset. Actually
clumpify.sh
from BBMap suite may be the best way to handle that case. Take a look at this thread for the various use cases forclumpify
: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files