Question

Cellranger count creating an empty matrix.mtx after adding 'N' and '#' to R1 UMI and quality respectively to account for 9bp UMI.

0

Entering edit mode

15 months ago

JHorder • 0

First post, apologies if any mistakes!

I have posted this to the cellranger Github as it is a specific question, but thought it might be of interest here also?

I am currently analysing public scRNA-seq datasets from raw reads.

For a public dataset, the R1 read length is 25. The library was generated with an older version of the chemistry. This throws an error in more recent versions of cellranger as the read length is too short (minimum 26).

To handle this, 'N' were added to the R1 reads as described in this 10XGenomics page via a sed '2~4s/$/N/' command.

This paper describes adding 'A' to the UMI to handle this.

A subsequent error was raised with the message Sequence and quality length mismatch. This was dealt with by adding a '#' to the quality score line to match with the 'N' of the sequence.

An R1 read length of 26 was achieved.

Example:

Prior to UMI length adjustment:

@ST-K00127:336:HVYF2BBXX:1:1101:23744:1244 1:N:0:GAGNATNT
GGGCACTTCTTGNATTTCTGTTTTC
+
AAFFFJJJJJJJ#FFAJJFJAJJFJ

Post UMI length adjustment:

@ST-K00127:336:HVYF2BBXX:1:1101:23744:1244 1:N:0:GAGNATNT
GGGCACTTCTTGNATTTCTGTTTTCN
+
AAFFFJJJJJJJ#FFAJJFJAJJFJ#

Cellranger count ran and completed with a success message.

However, it was noticed that the filtered_features_bc_matrix matrix.mtx.gz and the raw_features_bc_matrix matrix.mtx.gz files are empty bar the header:

Example:

%%MatrixMarket matrix coordinate integer general
%metadata_json: {"software_version": "Cell Ranger 4", "format_version": 2}
36601 551223 0

I'm assuming this is due to the handling of the UMI.

My other thoughts are that no cells are being detected due to issues with cell barcodes, but this can be checked against the barcode whitelist.

Has anyone come across a way to solve this? I would prefer to not use an older cellranger version if possible just to keep all analyses uniform.

Thanks!

$ cellranger --version
cellranger cellranger-7.0.1

scRNA-seq UMI cellranger aligning • 756 views

ADD COMMENT • link 15 months ago by JHorder • 0

0

Entering edit mode

I am not surprised that messing with raw data results in nonsense results. What chemistry is this dataset? V1 had 24bp R1 if I recall correctly.

ADD REPLY • link 15 months ago by ATpoint 82k

0

Entering edit mode

Neither am I, which is why I'm suprised to have seen it as a suggested workaround! V2 according to the publication, which is meant to have a 10 base UMI along with 16 base cell barcode (26 base R1). However, according to the 10X post linked in the original question 'Depending on the complexity of the library and study aims, shorter UMI lengths e.g. 9 bases, may be conceptually acceptable'. Which seems to have occurred here.

ADD REPLY • link 15 months ago by JHorder • 0