Question

Tool:Mitcr: A Software Tool For Analyzing T-Cell Receptor Sequencing Data

4

Entering edit mode

11.9 years ago

mikhail.shugay 3.5k

Dear fellow Bioinformaticians,

Allow me to introduce a recently published software tool MiTCR: software for T-cell receptor sequencing data analysis that allows extremely fast and accurate processing of T-cell receptor repertoire high-throughput sequencing data.

Main features include:

Extraction of complementarity determining region 3 (CDR3) sequence
Determination of variable (V), joining (J) and diversity (D) alleles according to IMGT nomenclature
Providing data on insertions and deletions produced by V(D)J recombination
Error correction algorithm that eliminates the majority of spurious clonotypes generated by PCR and NGS errors
Low system requirements, runs fast on a commodity hardware
Well-documented open-source software written in Java

For more information please visit http://mitcr.milaboratory.com

analysis sequencing ngs • 13k views

ADD COMMENT • link updated 2.0 years ago by Ram 45k • written 11.9 years ago by mikhail.shugay 3.5k

1

Entering edit mode

just an observation, always use the name of the tool in any announcement, mention etc. Helps establishing context. I will edit the title to adhere to this.

ADD REPLY • link 11.9 years ago by Istvan Albert 102k

0

Entering edit mode

Thanks a lot for correction

ADD REPLY • link 11.9 years ago by mikhail.shugay 3.5k

1

Entering edit mode

Hi, thank you for introduction miTCR. I analyzed the TCR data using the miTCR software and found it very powerful and useful. but now I meet some difficulties when I run the miTCR software, when I input the TCR fastq file., the error occurred in the analysis pipeline(java.Lang.runtimeException :Error while parsing quality). I try to change the phred33 or 64, but it is no use.I feel confused about this situation, I will deeply appreciate it if you could give me some useful advice.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 9.5 years ago by linzongwei850424 ▴ 10

0

Entering edit mode

Hello!

MiTCR accepts Phred quality scores in 0-40 range. New HiSeq runs produce quality values up to 50 Phred, which I'm pretty sure is the issue. You can manually fix those files by replacing all quality values above 40 by 40 (see this script for example https://github.com/mikessh/mageri-paper/blob/master/processing/FixQual.groovy).

ADD REPLY • link 9.4 years ago by mikhail.shugay 3.5k

0

Entering edit mode

i just try this software and found useful. but how can i change my "overrides target species"? i want to do some analysis on pig. how can i do that?

ADD REPLY • link 11.7 years ago by kayleigh.china • 0

0

Entering edit mode

Sorry for late reply. What kind of data are you analyzing? T-cell libraries or immunoglobulins? For analysis MiTCR requires V/J reference sequences, which are compiled from IMGT data. It is not quite straightforward, due to great complexity of IMGT database organization. So currently only TRa/b for human and mouse are supported. There will be a patch with TRgamma/delta available soon. Full functionality for a spectrum of species is currently being developed, and it will be available within a new tool (also supporting IGh/k/l). Still looking at IMGT database I see that sequences for pig only partially exist (for IGh/k/l and TRa-J&TRd-J). If you could share your reference germline sequences for V/J segments with marked conserved Cys/Phe/Try residues this could help to speed it up.

ADD REPLY • link 11.6 years ago by mikhail.shugay 3.5k

1

Entering edit mode

Hello, I want to use MiTCR to analysis my IGH data. Is it possible to integrate those reference into MiTCR or provide a parameter of reference file of fasta? Thanks for your attention!

ADD REPLY • link 11.1 years ago by xfliwz ▴ 50

0

Entering edit mode

Hello! The MiTCR software doesn't allow the integration of Immunoglobulin loci, as its internal search algorithm is not prone hypermutations and we could not guarantee optimal performance. We're now working on a software tool that could be used for high-throughput full-length antibody sequencing and have the performance characteristics similar or better than MiTCR. I will announce it upon release, which would happen in several months.

To analyze your IGH data you can use our recent MiGEC software, see this post. While its scope is a little bit different (it works with unique molecular identifier-tagged data), it provides fast IGH CDR3 extraction and V/J determination.

If you need whole-length analysis with hypermutations, you can use the wrapper for IgBlast software by NCBI, which is available here. This one is somewhat slower and less-documented.

Please let me know if you'll have any problems/questions during the analysis. In this case please also describe your library structure.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Hi Mikhail - excuse my ignorance, but what is the exact interpretation of the tilde (~) character in the amino acid sequence of the CDR3 regions I extract using mitcr? All the best, A.

ADD REPLY • link 8.4 years ago by a_marion • 0

0

Entering edit mode

~ indicates a frameshift. In case of a frameshift the V -> J and J -> V translations are performed, the central incomplete codon is marked as ~

ADD REPLY • link 8.4 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Thanks. How should we interpret this though, I guess we don't expect a functional TCR product from such a sequence that contains an incomplete codon?

ADD REPLY • link 8.4 years ago by a_marion • 0

0

Entering edit mode

Indeed the CDR3 amino acid sequence is mostly meaningless here. However when looking at your data tables manually it can sometimes help to see sequencing errors, frameshift hypermutations in case of antibody data, etc. So consider this as an aesthetic for CDR3aa column.

ADD REPLY • link 8.4 years ago by mikhail.shugay 3.5k

Ram · Answer 1 · 2014-06-06

3

Entering edit mode

11.1 years ago

xfliwz ▴ 50

Hello, thanks for your response! I try to use igblastwpr. But I am not familiar with JAVA. So could you package those scripts into a jar file. Thanks!

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by xfliwz ▴ 50

0

Entering edit mode

I packed the jars for different platforms, see https://github.com/mikessh/igblastwrp/releases/tag/v0.3. The readme is here. Run it as

$java -jar IgBlastWrapper.jar -cf -l 0,1,2 -R IGH input.fastq.gz outputFilePrefix

Please let me know of any problems with the pipeline. Also keep in mind that IgBlast is relatively slow, so basically there are two cases when you want to use it: 454 and MiSeq 300bp paired-end data.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Hi! I want to compile reference sequences of chicken TCR to to use MiTCR or IgBlast wrapper. Would you like to provide the detailed methods for that and the command-line usage? Thank you!

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by lizezhong ▴ 10

0

Entering edit mode

Hello. Not quite understood your question. There are currently insufficient references in IMGT to extract CDR3 region for chicken TCRs, actually only Joining segments of TCRalpha chain are present there (http://www.imgt.org/IMGTrepertoire/index.php?section=LocusGenes&repertoire=genetable&species=Chicken&group=TRAJ). If you can provide me with a list of Variable and Joining segment sequences for TCR chain you're interested in I could try to compile them to database.

The currently available references are listed here, the ones that have "1" in last ("VJ") column

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by mikhail.shugay 3.5k

Ram · Answer 2 · 2014-08-23

1

Entering edit mode

10.9 years ago

lizezhong ▴ 10

Thank you very much for your reply! I am analyzing NGS sequences of chicken and duck TCRB surrounding CDR3. I want to use MiTCR or IgBlast wrapper to extract CDR3s. Would you please help compile the chicken and duck Variable and Joining segment sequences into database and pack it into a MiTCR jar software? Besides, when I use IgBlast wrapper, part of the 3'ends of the CDR3 sequences are not accurate and do not end with an F. And the count of the clonotypes is smaller than the true number. Should I change the files in data/internal_data into sequences of chicken and duck? Thank you!

PS: I have sent the sequences to you by email.

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by lizezhong ▴ 10

0

Entering edit mode

Thank you one more time for your feedback. To be consistent this question has been answered in an update to this post.

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by mikhail.shugay 3.5k

Ram · Answer 3 · 2014-11-18

1

Entering edit mode

10.6 years ago

mcchance ▴ 30

Hi，thank you for introduction miTCR. I have found it very powerful and useful. But I am analyzing TRB sequences of Macaca mulatta, which miTCR don't support this species.I want to analyse CDR3 region by MiTCR. Would you please compile the TRB gene of Macaca mulatta into miTCR? All TRB genes are in IMTG database.

I am also aware of MiGEC through this post and found it can analyse TRB sequences of Macaque. Your library preparation method and analysis relies on unique molecular identifier tags(UMIs). Does MiGEC support data without UMIs, because my library don't have them? Thank you.

By the way, if MiTCR can analyse sequences of Macaques, what are the differences of functions of and results from MiTCR and MiGEC?

Thank you.

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by mcchance ▴ 30

0

Entering edit mode

Hello!

As I've already mentioned, there is a problem with adding new species into MiTCR, as those are somewhat hard-coded into the binaries. The problem originates from the way the IMGT database is organized, i.e. all those IMGT-gaps, etc, while it would be far better just to provide feature (CDR3 start, CDR2, CDR1, ..) coordinates. So adding new features require a high amount of manual work.

Indeed MiGEC has all species/receptor chains from IMGT, for which both V and J segments are available. MiGEC could be separately used for tasks like de-multiplexing, read overlapping and CDR3 extraction/clonotype assembly. You just should use CdrBlast module as is, check out this readme section and command line help by running java -jar migec.jar CdrBlast -h. MiGEC is slower than MiTCR as it was designed to handle BCR sequences containing hypermutations, but still faster than any alternatives (no problem to process a hiseq lane on a commodity hardware). The results are highly consistent between these tools.

So you can give it a try, and tell me if everything works fine.

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by mikhail.shugay 3.5k

Ram · Answer 4 · 2015-10-20

1

Entering edit mode

9.7 years ago

Deepali Vasoya ▴ 10

Hello Mikhail,

We are sequencing TCR of bovine samples and interested to use miTCR. Is it possible to replace the in-built human and mouse database with the bovine database? We are making the bovine database. One of my colleague has done some reverse engineering of the tool and found the database. We were analysing it and saw that conserved Cys and Phe information is bit confusing. The Cys position is where the Cys codon starts. But while looking at the J gene, we found that the position is -2 nucleotides shifted where the Phe codon starts. We are not sure about the position the algorithm will use to mark down the CDR3 flanking amino acids. If you can highlight us few tips, it would be very helpful.

Thank you

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 9.7 years ago by Deepali Vasoya ▴ 10

0

Entering edit mode

Hi!

It is quite hard to re-assemble the binary database file for MiTCR, but I can easily add your references to MiGEC/CdrBlast (https://github.com/mikessh/migec, MIGEC: towards error-free profiling of immune repertoires) as they are stored in tabular format there. It has some data for Bos Taurus, but it is incomplete. If you have both V and J references for your chain of interest (say, TRB), you can mail them to me (my nickname at biostars at gmail.com) and I'll incorporate them into new release.

As far as I recall, the convention for Cys/Phe reference point was the coordinate first base after conserved Cys/the coordinate of first base before Phe, 0-based.

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 9.7 years ago by mikhail.shugay 3.5k

Ram · Answer 5 · 2014-12-06

0

Entering edit mode

10.6 years ago

m338102001 ▴ 10

Dear mikhail.shugay:

Thank you for introduction about mitcr.

The user manual of mitcr has showed the pipeline for single-end read analysis, as shown:

mitcr -pset flex in.fastq.gz result txt

or $mitcr <options> <input file name> <output file name>

But I want to use paired-end file (which is separated into R1.fastq and R2.fastq) as input, could you please show me how to do that? as I know mitcr can also perform analysis on Illumina output.

Thank you very much!

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by m338102001 ▴ 10

0

Entering edit mode

Hello!

The usage depends on the structure of your library:

Usually after de-multiplexing one gets oriented reads, in this case only the FASTQ file that contains CDR3 should be specified
In some cases one want to overlap reads. Note that it is not the best practice when CDR3 is spread among both reads, but with most recent protocols (i.e. Illumina HiSeq 150+150) it is not a problem to read the entire CDR3. However, for some protocols a read-through situation could occur, so CDR3 is fully present in both reads. In such case one can either overlap reads or proceed to 3)
If the library is non-oriented, you can just concatenate FASTQ files

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Dear mikhail.shugay:

Thank you!

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by m338102001 ▴ 10