I have bulk mouse RNA-seq data from an external lab and am hoping to perform an analysis on the TCR-mapping reads contained within. The library preparation pipeline used resulted in 3 FASTQ files, two 75bp paired read files (R1 and R3), and a UMI file (R2).
I was hoping to use MixCR for this, however the MixCR pipeline does not incorporate the UMI information. One approach would be to deduplicate the FASTQs, followed by MixCR analysis. I've seen other posts recommending against FASTQ level deduplication, but I feel like it may be the best option here.
The external lab has provided mapped and unmapped BAM files which have been corrected for duplicates (corrected during mapping), I have attempted BAM>FASTQ conversion, and TCR pipelines that can use BAM as input, but with no success (the BAM files themselves seem rather non-canonical).
Any suggestions for FASTQ level deduplication tools that will accept 3 FASTQs in that format, or alternative solutions to my issue, would be greatly appreciated!
Thanks in advance! Gordon