I'm trying to edit an MSA (Multiple Sequence Alignment) file generated by ClustalW, to trim sequences before the consensus one, using BioPython. xxx refers to other bases not relevant here
Here's the example I/O :
ITS_primer_fw --------------------------------CGCGTCCACTMTCCAGTT RBL67ITS_full_sequence CCACCCCAACAAGGGCGGCCACGCGGTCCGCTCGCGTCCACTCTCCAGTTxxxxxxxxxxxxxxxx PRL2010 ACACCCCCGAAAGGGCGTCC------CCTGCTCGCGTCCACTATCCAGTTxxxxxxxxxxxxxxxx BBF32_3 ACACACCCACAAGGGCGAGCAGGCG----GCTCGCGTCCACTATCCAGTTxxxxxxxxxxxxxx BBFCG32 CAACACCACACCGGGCGAGCGGG-------CTCGCGTCCACTGTCGAGTTxxxxxxxxxxxxxxxx
ITS_primer_fw CGCGTCCACTMTCCAGTT RBL67ITS_full_sequence CGCGTCCACTCTCCAGTTxxxxxxxxxxxxxxxxxxxx PRL2010 CGCGTCCACTATCCAGTTxxxxxxxxxxxxxxxxxxxxx BBF32_3 CGCGTCCACTATCCAGTTxxxxxxxxxxxxxxxxxxx BBFCG32 CGCGTCCACTGTCGAGTTxxxxxxxxxxxxxxxxxxxx
The documented code for
AlignIO describes just a way to extract sequences by treating the alignment as an array. In this example
align = AlignIO.read(input_file, "clustal") sub_alignment = align[:,20:]
I was able to extract a subalignment made by all the sequences (:) starting from the 20th nucleotide. I'm looking for a way to replace the
20 in the example with the position of the first nucleotide of the consensus sequence.
Any answers including some cline software to trim easly as requested are well accepted. Will be great if python coded or for UNIX.