If you have non-enriched (I mean non-V(D)J-enriched, after emulsion) 5' single-cell 10x data, prepared according to the 10x's manual, then, very approximately, you should catch 30~50% of the T-cells, with 20~30% of the cells having both TRA+TRB chains (the main factor here seems to be the quality of size selection, preformed in the wet lab, depending on it you can get significantly more or less), and somewhat more B-cells (as the level of expression is higher). This is not to mention that PBMC contains ~50% T-cells and ~10% B-cells (again, actual numbers in the samples may be significantly different, these are average numbers). For the V(D)J enriched library, virtually 100% of the T-/B-cells and TCRs/BCRs must be reconstructed by MiXCR.
As for the the bulk RNAseq, it depends on many factors and can be anywhere between 1 CDR3 per 10^5 to 1 CDR3 per 10^7 reads in the sample.
And the last important point in this respect is that single-cell and rna-seq datasets are obviously prepared on different sets of cells, so it might be even harder to find the intersection between them, because of the cell sampling. This will highly depend on the repertoire structure, how many expanded clones are there in the mix.
As for the comparison with TRUST and other software packages, there are several very important types of problems, associated with analysis of such type of low yield libraries, that, if not properly accounted for, will lead to incorrect conclusions about the datasets in question.
There are many non-TCR / -IG sequences which look like one, such sequences may yield false CDR3's, and what is more dangerous, reproducible false CDR3's, that will look like false overlap between samples. MiXCR was thoroughly tuned (on real and in-silico generated data), to prevent this from happening. So, for RNASeq, it gives zero false CDR3s of this sort.
to increase the total yield, it is beneficial to find partial sequences with only parts of CDR3's and assemble the whole CDR3 from such halves. This procedure should, again, be very strictly controlled, because all CDR3s consists of similar parts (V, D and J genes) and false intersection can be easily found. Resulting sequence will be a chimeric sequence which is not actually present in the sample (the false positive). This type of false-positives will just falsely increase the diversity, and is not that easy to spot without control datasets.
and the most obvious source of false diversity is sequencing and amplification errors, which creates similar CDR3 but with one or two substitutions or, less often, indels.
all those sources of false-positives are very strictly controlled in MiXCR (by tuned aligners, NDN-aware partial-assembly algorithms and multi-layer error corrections respectively). MiXCR results showed high level of reliability in many studies.
Also, MiXCR supports single cell analysis so it makes more sense to compare data aanalysed with the same software.
Now I have become a member there (free for two years)
How I can ask my questions there?
On the Slack channel, you can choose "#computational_questions" under "Channels" on the left side bar. Then you can post a question just like on Biostars.