Dear Biostars community,
We are using the Clontracer library (https://www.nature.com/articles/nm.3841) to track individual clones and subpopulations in ER positive tumor clonality models. Briefly, from the complex cell pools we have isolated single clones which surprisingly contain more than a single barcode integration making the deconvolution of Sanger chromatograms challenging.
The Clontracer library utilizes a 30bp long semi-random DNA barcode meaning that a weak base (A or T) is followed by a strong base (C or G) in an iterative manner to tag individual cells. In their the original publication, cells were transduced at a very low MOI of 0.1 resulting in 90.48% non-transduced cells, 9.05% cells being transduced with a single barcode and 0.47% cells being transduced with more than one barcode (according to Poisson distribution). Transduced/barcoded cells were then selected under targeted therapy and the barcode complexity of treated cells was investigated by next-generation sequencing (NGS) (Panel A). We also performed NGS of our complex barcoded cell populations, analysed the obtained reads as described (https://github.com/luca8651/Barcode_analyses-python) and we therefore know the barcodes present in the investigated cell pools.
Next, we isolated single cells from the pools using a single cell printer. Statistically, the isolated single cell clones should contain a single barcode. A PCR was established to amplify the region containing the barcode and the barcode sequences were investigated by Sanger sequencing (Panel B). Interestingly, some clones contain more than one barcode although we used an MOI even lower than in the original paper. The example shown in Panel C contains three different barcodes which were manually deconvoluted using SnapGene Viewer for visualization and the list of possible barcodes obtained from the NGS. However, the manual analysis is very tedious and error prone.
So, my important questions are the following: 1) Is there a direct explanation why for some clones we are getting more than one barcode although we used an MOI even lower than the utilized workflow paper? 2) On this premise, are there any suggested tools or pipelines that can utilize the Sanger sequencing data (.ab1) and the list of expected barcodes from the NGS (txt) as input, and would list the identified barcodes as output?
Please excuse me for any naïve questions, but it's my first time to analyze these complex data and any feedback would be appreciated !!
Thank you in advance, Lukas