Entering edit mode
5.4 years ago
Manu Madhavan
▴
20
Hi,
Currently I am working on classifying coding/non coding RNAs based on sequence features. I want to try MSA score as a feature for sequence classification.
- Please help me to get MSA score (using Clustlw) for a collection of sequences, using Biopython.
- How can I a get a profile sequence after MSA of training sequences?
- Also suggest other features that can be derived from MSA
What have you tried so far?
I have used k-mer (k-length substring) frequencies, GC content, Molecular weight ...,. I would like to include features from MSA (some variance from profile sequence) to this feature set.
There is a Clustal module for python I believe, though I've never personally used it. There should be a way to get the scores back out of that (though I can't help you with the specifics). In the past, I've simply parsed the STOUT output of running a commandline invocation of Clustal. It prints the pairwise alignment scores for all the sequences so you could run with that.
For building a profile sequence, you can pass your MSA through
hmmer
to get an HMM of the sequence.As for other features, I can't think of anything specifically that would be valid for all sequences beyond their pairwise agreements. Others may know more...