Question: Pairwise sequence alignment
0
gravatar for jitendoshi2006
4.0 years ago by
jitendoshi20060 wrote:

Could anyone suggest me an easiest way to perform pairwise sequence alignment wherein my query sequence is the sequencing reads obtained from NGS and the subject sequence is a constant part expected to be found on the reads. So, i need to have a score and length of alignment as parameters to screen and filter. I would like to execute Smith-Waterman algorithm. I tried using python but it is getting complicated for me to find EMBOSS suite (which I finally downloaded) but then cannot execute it from python (tried WaterCommandline). And I am not even sure that this will give me a score and alignment length if I successfully manage to execute it. I am looking up in R now to see if it is possible and reiterate all the reads against a subject sequence. Then, I would like to splice the query sequence and store it in a file in a fasta format following the sequence alignment. Any ideas on how to conveniently do this will be appreciated.

alignment • 2.1k views
ADD COMMENTlink modified 4.0 years ago by ALchEmiXt1.9k • written 4.0 years ago by jitendoshi20060

It is unclear what you are exactly trying to do? Calling SNP? Instead of doing pairwise alignments one option could be to do NGS alignments as usual and then pull the reads out in the region you are interested in followed by converting them to fasta format and then do a multiple sequence alignment (MSA).

ADD REPLYlink written 4.0 years ago by GenoMax96k

Firstly, I am searching for the sequence (subject) that I know is going to be present in the reads. I am interested in the reads that possess the subject sequence. I thought the best way was to do pairwise sequence alignment but I do not find a good platform for for performing these alignments. I also tried bio.pairwise2 in python but it does not align well. I hope i explained it well. What do you mean by NGS alignments? Are there any programs to do so?

Then, I remove the subject sequence part from the reads. Additionally, I would like to align the remaining part of reads to each other so that the redundant reads can be counted as '1' and then align the sorted reads to the genome. I plan to use BLAT/Bowtie for genome alignment

ADD REPLYlink written 4.0 years ago by jitendoshi20060
2

What I meant was alignment with a regular NGS data aligner like bbmap, bwa, bowtie2 etc. These are always going to be very efficient aligning millions of reads rather than blast other similarity search techniques. Once you have the alignment to "genome" (which could be your reduced region of interest though you may get some misalignments) you can extract reads that aligned to the region of interest.

Before you do that it sounds to me like you are expecting duplication/redundancy in this dataset. Actually clumpify.sh from BBMap suite may be the best way to handle that case. Take a look at this thread for the various use cases for clumpify: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by GenoMax96k
1
gravatar for fishgolden
4.0 years ago by
fishgolden450
fishgolden450 wrote:

I think the easiest way to perform Smith-Waterman is SSEARCH in FASTA package. Please check. https://github.com/wrpearson/fasta36/releases

ADD COMMENTlink written 4.0 years ago by fishgolden450
1
gravatar for ALchEmiXt
4.0 years ago by
ALchEmiXt1.9k
The Netherlands
ALchEmiXt1.9k wrote:

MUMmer is fast and fit for purpose. http://mummer.sourceforge.net/

ADD COMMENTlink written 4.0 years ago by ALchEmiXt1.9k
0
gravatar for Carlo Yague
4.0 years ago by
Carlo Yague5.7k
Canada
Carlo Yague5.7k wrote:

Hi,

I used blast+ for a similar task and it worked fairly well. You'll need to convert the .bam/.sam file to a fasta file first but the results in tabulated format are really easy to filter based on score or alignment length.

Not sure to understand what you mean by "splicing" the query.

Best,

Carlo

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Carlo Yague5.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2245 users visited in the last hour
_