Multiple Sequence Alignment Python
0
0
Entering edit mode
3.0 years ago

Hi, Currently I am working on classifying coding/non coding RNAs based on sequence features. I want to try MSA score as a feature for sequence classification.

• Please help me to get MSA score (using Clustlw) for a collection of sequences, using Biopython.
• How can I a get a profile sequence after MSA of training sequences?
• Also suggest other features that can be derived from MSA
Biopython Multiple Sequence Alignment ClustlW • 1.9k views
0
Entering edit mode

What have you tried so far?

0
Entering edit mode

I have used k-mer (k-length substring) frequencies, GC content, Molecular weight ...,. I would like to include features from MSA (some variance from profile sequence) to this feature set.

1
Entering edit mode

There is a Clustal module for python I believe, though I've never personally used it. There should be a way to get the scores back out of that (though I can't help you with the specifics). In the past, I've simply parsed the STOUT output of running a commandline invocation of Clustal. It prints the pairwise alignment scores for all the sequences so you could run with that.

For building a profile sequence, you can pass your MSA through hmmer to get an HMM of the sequence.

As for other features, I can't think of anything specifically that would be valid for all sequences beyond their pairwise agreements. Others may know more...