2.2 years ago
coffeyrt • 0


Multithreaded barcode counter originally written for DNA encoded libraries (DEL). I expanded it for use with other data types, such as sequencing from high throughput CRISPR screens. It works very well with all test datasets I've tried. As a comparison, the group I was working with was taking 2 weeks to do the same analysis, while this algorithm took a bit over an hour. I've heard of other DEL groups taking 18-24 hours for analysis. I thought I would share it with the community. Any feedback or suggestions are welcome. I'm currently working on incorporating sequencing quality scores for filtering. It works best with inflated fastq files (not gzipped). I tested it on a few gzipped fastq files. Although it worked well with almost all test gzip files, it stopped early on one test file and I could not track down why. Therefor inputting gzip files is still experimental.

