I am working with fastq files from a single-cell RNA (seq MARS Seq) published on the GEO website GEO: GSE98969. My first goal is to get an expression matrix for each sample , i.e. a matrix U with n rows and m columns, where rows represent genes and columns represent cells. Entry Uij contains the number of UMIs from gene i that were found in cell j.
The problem is that each fastq file contains over 300 cells. The reads are barcoed for cell and for molecule (RMT) in line 1.
I use tophat2 for alignment. I thought about doing th following:
After alignment, per bam file, I'll divide the mapped reads to cells according to cell-barcodes, and will take into account only one count per RMT.
Is there any publicly available tool that you use for this task (which is aware of cell-barcodes and UMIs in the process of generating expression matrix)?