I will be using Drop-seq to prepare cDNA libraries for thousands of single cells, followed by single-end sequencing on a HiSeq2500 (read length = 100 bases). This will involve the addition of a unique 12 nucleotide cell barcode to the 5' end of all reads originating from the same cell (so bases 1-12 of each read). Unique molecular identifiers will also be used (they will be bases 13-20 of the reads). I won't be able to use the pipeline designed by the creators of drop-seq because they require paired-end sequencing. My problem is with demultiplexing: I am using randomly generated cell barcodes which will be supplied in excess, so I have no way of knowing beforehand which barcodes were actually used. Because of this, I am unable to supply the barcode sequence information that most scripts out there require as input. As someone with minimal computational/bioinformatics skills, I am not comfortable writing custom scripts to fit my needs.
Does anyone know of any scripts/packages that I would be able to use to demultiplex my data based on the in-line cell barcodes? I am planning on using the python package UMI tools (https://github.com/CGATOxford/UMI-tools) to extract the UMIs and deduplicate reads, but I have not been able to find any information on how to separate the reads from different cells into distinct fastq files.
Thanks in advance for any recommendations!