Question

Pairwise Sequence Alignments with Numbered Antibody Sequences

0

Entering edit mode

7 months ago

jsweet6 • 0

Hi All,

I have been using Biopython to explore the diversity in some of my antibody sequences using a pairwise alignment. However because there are established places in the amino acid sequences where insertions and deletions may occur, there are certain numbering schemes for these sequences which allow residues to be compared like-for-like.

I have provided a couple of examples where these sequences have been numbered and aligned to the Chothia scheme. Missing residues have been spaced out using a dash so the result is that the sequences are the same length.

seq1 = "QVQLVQSGAEVKKPGASVKVSCKASGYTFTV--FYIFWVRQAPGQGPEWMGWINP--NSGGTSYAQNFQGRVTMTRDTSVSTAYMELSRLTSDDTAVYFCARGRRGLITEF--------DYWGQGTLVTVSS"

seq2 = "QVQLVESGGGLVKPGGSLRLSCAASGFTFSD--YYMSWIRQAPGKGLEWVSYISS--SGSTIYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARIAAAGKN----------DYWGQGTLVTVSS"

Using Biopython Align, the program continues to insert gaps into the sequence like so:

from Bio import Align    
Align.PairwiseAligner(seq1,seq2)

The outputted alignment is like so:

QVQLVQ-SGAE---VKKPGA-SVKV---SCKA-SGYTFTV-----FYIF---WV-RQAPGQ-GP-EWMGW---INP----NSGGTS--Y-AQNFQ----GRV-TMT--RDTSVST-A----YMEL---SRLTS---DDTAVYF-CARGRRGLITEF----------------DYWGQGTLVTVSS

QVQLV-ESG--GGLVK-PG-GS---LRLSC-AASG--FT-FSD---Y--YMSW-IRQAPG-KG-LEW---VSYI--SS---SG--STIYYA----DSVKGR-FT--ISRD-----NAKNSLY--LQMNS-L--RAED-TAVY-YCAR-----I---AAAGKN----------DYWGQGTLVTVSS

But I would like it to obtain a pairwise alignment score us the original gaps in the sequence. How can this be achieved? Or is there another program which may help? Would a simple Hamming distance be sufficient?

Best, James

sequence alignment antibody • 394 views

ADD COMMENT • link updated 7 months ago by Jesse ▴ 740 • written 7 months ago by jsweet6 • 0

0

Entering edit mode

If you already have the alignment itself done as you want, you could just score it yourself directly in Python with whatever rules you have in mind. A straight hamming distance could just be a simple one-liner like sum(x != y for x, y in zip(seq1, seq2)). That said, what's your end goal? There are a bunch of specialized immune receptor programs that might end up being more the right tool than plain Python/Biopython, and some related posts here-- for example see this recent answer about sequence annotation and all those competing numbering schemes.

ADD REPLY • link 7 months ago by Jesse ▴ 740