Question: Multiple Sequence Alignment Python
gravatar for Manu Madhavan
13 months ago by
NIT Calicut
Manu Madhavan10 wrote:

Hi, Currently I am working on classifying coding/non coding RNAs based on sequence features. I want to try MSA score as a feature for sequence classification.

  • Please help me to get MSA score (using Clustlw) for a collection of sequences, using Biopython.
  • How can I a get a profile sequence after MSA of training sequences?
  • Also suggest other features that can be derived from MSA
ADD COMMENTlink written 13 months ago by Manu Madhavan10

What have you tried so far?

ADD REPLYlink written 13 months ago by jrj.healey12k

I have used k-mer (k-length substring) frequencies, GC content, Molecular weight ...,. I would like to include features from MSA (some variance from profile sequence) to this feature set.

ADD REPLYlink written 13 months ago by Manu Madhavan10

There is a Clustal module for python I believe, though I've never personally used it. There should be a way to get the scores back out of that (though I can't help you with the specifics). In the past, I've simply parsed the STOUT output of running a commandline invocation of Clustal. It prints the pairwise alignment scores for all the sequences so you could run with that.

For building a profile sequence, you can pass your MSA through hmmer to get an HMM of the sequence.

As for other features, I can't think of anything specifically that would be valid for all sequences beyond their pairwise agreements. Others may know more...

ADD REPLYlink written 13 months ago by jrj.healey12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 833 users visited in the last hour