Cellranger count creating an empty matrix.mtx after adding 'N' and '#' to R1 UMI and quality respectively to account for 9bp UMI.
0
0
Entering edit mode
15 months ago
JHorder • 0

First post, apologies if any mistakes!

I have posted this to the cellranger Github as it is a specific question, but thought it might be of interest here also?


I am currently analysing public scRNA-seq datasets from raw reads.

For a public dataset, the R1 read length is 25. The library was generated with an older version of the chemistry. This throws an error in more recent versions of cellranger as the read length is too short (minimum 26).

To handle this, 'N' were added to the R1 reads as described in this 10XGenomics page via a sed '2~4s/$/N/' command.

This paper describes adding 'A' to the UMI to handle this.

A subsequent error was raised with the message Sequence and quality length mismatch. This was dealt with by adding a '#' to the quality score line to match with the 'N' of the sequence.

An R1 read length of 26 was achieved.

Example:

Prior to UMI length adjustment:

@ST-K00127:336:HVYF2BBXX:1:1101:23744:1244 1:N:0:GAGNATNT
GGGCACTTCTTGNATTTCTGTTTTC
+
AAFFFJJJJJJJ#FFAJJFJAJJFJ

Post UMI length adjustment:

@ST-K00127:336:HVYF2BBXX:1:1101:23744:1244 1:N:0:GAGNATNT
GGGCACTTCTTGNATTTCTGTTTTCN
+
AAFFFJJJJJJJ#FFAJJFJAJJFJ#

Cellranger count ran and completed with a success message.

However, it was noticed that the filtered_features_bc_matrix matrix.mtx.gz and the raw_features_bc_matrix matrix.mtx.gz files are empty bar the header:

Example:

%%MatrixMarket matrix coordinate integer general
%metadata_json: {"software_version": "Cell Ranger 4", "format_version": 2}
36601 551223 0

I'm assuming this is due to the handling of the UMI.

My other thoughts are that no cells are being detected due to issues with cell barcodes, but this can be checked against the barcode whitelist.

Has anyone come across a way to solve this? I would prefer to not use an older cellranger version if possible just to keep all analyses uniform.

Thanks!

$ cellranger --version
cellranger cellranger-7.0.1
scRNA-seq UMI cellranger aligning • 756 views
ADD COMMENT
0
Entering edit mode

I am not surprised that messing with raw data results in nonsense results. What chemistry is this dataset? V1 had 24bp R1 if I recall correctly.

ADD REPLY
0
Entering edit mode

Neither am I, which is why I'm suprised to have seen it as a suggested workaround! V2 according to the publication, which is meant to have a 10 base UMI along with 16 base cell barcode (26 base R1). However, according to the 10X post linked in the original question 'Depending on the complexity of the library and study aims, shorter UMI lengths e.g. 9 bases, may be conceptually acceptable'. Which seems to have occurred here.

ADD REPLY

Login before adding your answer.

Traffic: 1401 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6