Multiple Sequence Alignment Python
0
0
Entering edit mode
6.0 years ago

Hi,

Currently I am working on classifying coding/non coding RNAs based on sequence features. I want to try MSA score as a feature for sequence classification.

  • Please help me to get MSA score (using Clustlw) for a collection of sequences, using Biopython.
  • How can I a get a profile sequence after MSA of training sequences?
  • Also suggest other features that can be derived from MSA
multiple-sequence-alignment ClustalW Biopython • 2.9k views
ADD COMMENT
0
Entering edit mode

What have you tried so far?

ADD REPLY
0
Entering edit mode

I have used k-mer (k-length substring) frequencies, GC content, Molecular weight ...,. I would like to include features from MSA (some variance from profile sequence) to this feature set.

ADD REPLY
1
Entering edit mode

There is a Clustal module for python I believe, though I've never personally used it. There should be a way to get the scores back out of that (though I can't help you with the specifics). In the past, I've simply parsed the STOUT output of running a commandline invocation of Clustal. It prints the pairwise alignment scores for all the sequences so you could run with that.

For building a profile sequence, you can pass your MSA through hmmer to get an HMM of the sequence.

As for other features, I can't think of anything specifically that would be valid for all sequences beyond their pairwise agreements. Others may know more...

ADD REPLY

Login before adding your answer.

Traffic: 3156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6