I am trying to process single cell data from a public dataset, containing over 6000 single-end 51 bp fastq files. Each fastq file represents a single cell and contains a 9-11 bp UMI, leaving 40 bp for mapping.
I have used UMI-tools to extract the UMI sequence from every read in every file.
Is there an efficient way to handle such a large number of files for STAR or kallisto? It feels as though this process would be faster if I had fewer, barcoded, larger fastq files?