Question: How to change cellranger multimapping algorithm
gravatar for
6 weeks ago by
changxu.fan30 wrote:

Hi~ I'm currently using 10x cellranger to analyse single cell RNA-seq data. According to their algorithm, reads mapping confidently to more than one exons will be discarded. However, there are paralogous genes in the genome that are largely identical and all the reads for such genes are discarded. Therefore, I was wondering if there is a way to change the algorithm to count the first (or a random) confident alignment. Unfortunately, I wasn't able to locate the file containing the algorithm. Any hints would be appreciated. Thanks every one!

rna-seq • 199 views
ADD COMMENTlink modified 6 weeks ago by kristoffer.vittingseerup1.4k • written 6 weeks ago by changxu.fan30
gravatar for Rob
6 weeks ago by
United States
Rob3.1k wrote:

The cellranger UMI deduplication algorithm does not handle reads that map among multiple genes, there is no "easy" way to handle this situation. You may be interested in taking a look at our quantification tool, alevin, which we've designed, in part, to help deal with these cases. In addition to having a methodology for handling reads that map between multiple genes, it is much faster than cellranger.

ADD COMMENTlink written 6 weeks ago by Rob3.1k

Thanks a lot! Those reads previously discarded by cellranger are showing up in my analysis now! It's an amazing tool.

ADD REPLYlink written 6 weeks ago by changxu.fan30

May I ask in the Alevin output quant_mat.csv files, why are there numbers with decimals? Are the numbers still representing the number of transcripts detected? I plan to use Seurat to perform downstream data cleaning and clustering but I'm not sure if I should still perform all the normalization, etc, as I would normally do for cellranger count - generated data. Thank you so much!

ADD REPLYlink written 18 days ago by changxu.fan30

There's a tutorial of how to use alevin with seurat here. The fractional values are not due to any normalization, but because it is sometimes impossible to resolve gene-ambiguous UMIs deterministically (based on parsimony). In that case, alevin resolves these UMIs probabilistically.

ADD REPLYlink written 17 days ago by Rob3.1k

I'm really sorry to bother again, but we recently got some "paired end" sequencing data, using 5' capture protocol and thus both R1 and R2 contains more than 150 bp. I was wondering if alevin can adapt to this? Thank you so much, Fan

ADD REPLYlink written 12 days ago by changxu.fan30

thus both R1 and R2 contains more than 150 bp

What does that exactly mean? You have reads longer than 150 bp for R1 and R2 each?

ADD REPLYlink written 12 days ago by genomax62k

Usually with 3' capture R1 is used only for barcodes and UMI, so we only sequence 26bp. But with 5' capture, R1 is barcode + UMI + useful sequence....

ADD REPLYlink written 12 days ago by changxu.fan30
gravatar for kristoffer.vittingseerup
6 weeks ago by
European Union
kristoffer.vittingseerup1.4k wrote:

I don't think cellranger can do this - but the tool Alevin (github, biorxive paper) does support multi-mapping read/UMIs since it builds on Salmon quantification. Since it builds on Salmon the quantifications will also be more accurate (and much faster).

ADD COMMENTlink written 6 weeks ago by kristoffer.vittingseerup1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 683 users visited in the last hour