Question

Count expression of 10x scRNA-seq after UMI extraction

0

Entering edit mode

20 months ago

tomas4482 ▴ 390

Cell ranger is the most used software to quantify gene expression of single cell in 10x library. But most of my data do not have standard file format like _R1_L001.fastq and _R2_L001.fastq and they have the same read length (which is very common in many studies). Thus I extracted UMI using UMI_tools whitelist and UMI_tools extract. The R2 fastq was then aligned to reference genome.

According to the UMI_tools tutorial, I need to use featureCount to assign reads to gene, and count UMI per cell using UMI_tools count.

But there are two shortages about the tool. First, umitools does not support multithread function, which is VERY time-consuming. Second, it is storage killer. It requires featureCount to create a new bam file with an additional new tag. Then I have to sort the bam, which can also be time-consuming if there are too many bam files. Finally, I need to take the sorted bam to UMI_tools count. It will eventually generate the count matrix. The whole pipeline will triple the occupied storage of bam files in the disk. It is a disaster for me and I really need to save some space.

May I ask if there is any other method to quantify gene expression faster and more convenient in my case? It would be so kind of you to give me a hint. Thanks.

featureCount subread scRNA umitools • 977 views

ADD COMMENT • link 20 months ago by tomas4482 ▴ 390

0

Entering edit mode

Don’t reinvent the wheel. You would need to build an entire custom pipeline. Either rename the fastq files (e.g. symlink them first, then rename) or use any other pipeline such as salmon-alevin.

ADD REPLY • link 20 months ago by ATpoint 82k

0

Entering edit mode

Thanks for your kindly advice. I was thinking if I can reuse the aligned bam. It appears that a better option is to rename the original fastq and run cellranger.

ADD REPLY • link 20 months ago by tomas4482 ▴ 390

0

Entering edit mode

Probably yes, be sure to do the renaming via a script to have it reproducible and track which file gets which name in some kind of a log file.

ADD REPLY • link 20 months ago by ATpoint 82k

score 0 · Answer 1 · 2022-08-20

0

Entering edit mode

20 months ago

swbarnes2 14k

Have you tried just running your files as they are through cellranger?

ADD COMMENT • link 20 months ago by swbarnes2 14k

0

Entering edit mode

No, I didn't. I have built another pipeline to extract UMI barcodes from original fastq and take extracted fastq as single end reads to perform alignment. Clearly cellranger does everything for you and is highly recommended as the solution. Cellranger will automatically perform extraction and alignment for fastq. That's why I was thinking if I can do something with these aligned bam files so I can skip extraction and alignment steps and save a lot of computational resources.

ADD REPLY • link 20 months ago by tomas4482 ▴ 390