how to identify CDR region in antibody sequence
2
3
Entering edit mode
11 months ago
reany ▴ 50

I want to extract CDR region form an antibody sequence or numbered antibody sequence. Because SCALOP will miss H-CDR3, is there annother tool could identify CDR region? After numbering the antibody sequence by ANARCI, whether it's the right way to extract CDR region according to the position of CDR definitions?

I am looking for some stand-alone tools or correct rules to extract CDR after numbering antibody sequence. I am not sure the definition is enough to do this.

Any other suggestions would be appreciated.

CDR antobidy • 2.1k views
ADD COMMENT
0
Entering edit mode

What species are you analyzing? IMGT is a great resource.

ADD REPLY
1
Entering edit mode

Sorry about that the problem was not clearly described. Yes IMGT is useful, but i am looking for some stand-alone tools or correct rules to extract CDR after numbering antibody sequence. I am not sure the definition is enough to do this.

ADD REPLY
1
Entering edit mode
11 months ago
Jeremy ▴ 810

I would try MIXCR. You can look for other tools on b-t.cr under the Software category. If your main focus is antibodies, you could also consider joining the AIRR Slack channel and posting a question there.

CDR H3 starts one amino acid after the conserved cysteine at the end of the VH region and ends 1 amino acid before the conserved tryptophan at the beginning of the JH region. (See Figure 5 in the paper below.)

Antibody Diversity Paper

ADD COMMENT
1
Entering edit mode

MIXCR seems like situed for massively parallel sequencing data while my input is relatively simple, for example, just a protein sequence of light chain. I'm still going to look into MIXCR and hopefully learn something from it. Other suggestions are also ueful and thanks very much.

ADD REPLY
0
Entering edit mode

If you can provide an example sequence, I can probably write some R code to detect CDR H3.

ADD REPLY
0
Entering edit mode

Thx so much. The method to identify CDR is my focus. Maybe i can get CDR regions by some simple code like [residue for residue, index in numbered_residues if CDR_start_index <= index <= CDR_end_index]

ADD REPLY
0
Entering edit mode

Right. For the start, you'll want to count from the beginning, and for the end, you'll want to count backwards from the end.

ADD REPLY
0
Entering edit mode

I've made an app for extracting CDR H3 from an amino acid sequence:

CDRH3finder

ADD REPLY
1
Entering edit mode

Neat! What's your method under the hood? It looks like it starts a few AA too far in for a couple human sequences I tried, though it got the end position correct.

ADD REPLY
0
Entering edit mode

For human, I'm going from position 98 to -11 of the input sequence. It worked for some human sequences I found on NCBI, but my expertise is really cow antibodies. Do you know what germline VH gene your sequences were or if there were any insertions in CDR1 or CDR2? I might need to re-think the human option. Thanks for your feedback!

ADD REPLY
0
Entering edit mode

IMGT's IGHV4-34*02, and no insertions, but thanks to the randomness of VDJ recombination and nontemplated nucleotides you can end up with CDR3 positions all over the place even without later insertions/deletions, so you can't rely on position in the sequence alone.

It's a tricky problem, especially with just AA versus NT. igblastn requires a table of positions in each J gene to figure out the CDR3 end position (and from what reany said in another comment that's apparently not totally reliable in igblastp). I'm not sure what it's doing for the start position, but I'd bet it's looking for the conserved C late in the V gene that should come just before CDRH3. All my experience is with human and rhesus macaque so I can't say how cow might differ though.

ADD REPLY
0
Entering edit mode

Cows are a little simpler because in the germline sequence, the conserved cysteine is always in the same position with respect to the beginning of the VH gene and the conserved tryptophan is always in the same position with respect to the end of the JH gene, but indels could throw that off.

ADD REPLY
1
Entering edit mode

The AIRR folks are on slack? I had no idea. Thanks for mentioning that.

ADD REPLY
0
Entering edit mode
11 months ago
Jesse ▴ 580

I generally just use IgBLAST when I need antibody sequence annotations like CDR3. There's a command-line version and you can have arbitrary species and gene references (though in that case you need to jump through some hoops and create an "auxiliary data file" to get it to report the CDR3 info). It can also give AIRR-compatible TSV output so you can extract sequences directly from the specific columns you want, like cdr3, without worrying about a particular numbering scheme (e.g. kabat) and extracting subsequences yourself, but there are a ton of columns with position info as well. The web interface uses IMGT references by default, too. IMGT's own V-QUEST tool can give similar info, but is web-only and I don't believe supports custom references. There are a bunch of tools out there for bulk data but IgBLAST scales up and down nicely to even just one or a few sequences.

Also I'd reiterate what Jeremy said about the junction (where the sequence for the conserved amino acids at each end are included) versus the CDR3 (which leaves out those bits). Some texts jumble those definitions and that's tripped me up before.

ADD COMMENT
0
Entering edit mode

For igblastp, auxiliary_data is not support to get accurate CDR3 aligment although it will report H-CDR3 sometimes. Is it a challenge for existing tools to delimit H-CDR3 or have other reasons?

ADD REPLY
0
Entering edit mode

Oh, sorry, I haven't tried it for amino acid sequences so I don't know about igblastp's behavior. I'm a bit surprised any of these tools would have much trouble labeling CDR3 though I suppose there's less to go by with just the amino acid sequence. The IgBLAST docs say igblastp doesn't search D and J which matches what you're saying. Do you have nucleotide sequence you could use too, or just amino acid?

ADD REPLY
0
Entering edit mode

Only protein sequence aviliable. By changing some settings, H-CDR3 can be report with SCALOP finally although that may be inaccurate. Thanks anyway.

ADD REPLY

Login before adding your answer.

Traffic: 1625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6