Question: Multiple Sequence Alignment Python
gravatar for Manu Madhavan
2.7 years ago by
NIT Calicut
Manu Madhavan20 wrote:

Hi, Currently I am working on classifying coding/non coding RNAs based on sequence features. I want to try MSA score as a feature for sequence classification.

  • Please help me to get MSA score (using Clustlw) for a collection of sequences, using Biopython.
  • How can I a get a profile sequence after MSA of training sequences?
  • Also suggest other features that can be derived from MSA
ADD COMMENTlink written 2.7 years ago by Manu Madhavan20

What have you tried so far?

ADD REPLYlink written 2.7 years ago by Joe18k

I have used k-mer (k-length substring) frequencies, GC content, Molecular weight ...,. I would like to include features from MSA (some variance from profile sequence) to this feature set.

ADD REPLYlink written 2.7 years ago by Manu Madhavan20

There is a Clustal module for python I believe, though I've never personally used it. There should be a way to get the scores back out of that (though I can't help you with the specifics). In the past, I've simply parsed the STOUT output of running a commandline invocation of Clustal. It prints the pairwise alignment scores for all the sequences so you could run with that.

For building a profile sequence, you can pass your MSA through hmmer to get an HMM of the sequence.

As for other features, I can't think of anything specifically that would be valid for all sequences beyond their pairwise agreements. Others may know more...

ADD REPLYlink written 2.7 years ago by Joe18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 993 users visited in the last hour