error while running cellranger: an extremely low rate of correct barcodes was observed for all the candidate chemistry choices for the input
Entering edit mode
8 months ago
jude • 0

Hi, I'm quite new to scRNA-seq and I got an error as below while trying to run cellranger (version 7.2.0) for public human brain cortex data.

[error] Pipestance failed. Error log at:

Log message:
An extremely low rate of correct barcodes was observed for all the candidate chemistry choices for the input: Sample FL in "path/to/FL/fastq/files". Please check your input data.
- 0.1% for chemistry SC3Pv3
- 0.1% for chemistry SC3Pv3HT
- 0.0% for chemistry SC5P-PE
- 0.0% for chemistry SC3Pv2
- 0.0% for chemistry SC3Pv3LT

Waiting 6 seconds for UI to do final refresh.
Pipestance failed. Use --noexit option to keep UI running after failure.

The public data I used was PRJNA491456 (included only paired end datas for my analysis) and I'm sure they are all single cell RNA-seq data, and pretty sure they are all 10x data (metadata says they are prepared from Illumina HiSeq 4000). I've searched some posts related with my problem but most of the problems happened because of different library preparation method and also this data is not multiome data ( When I ran cellranger with another public dataset from Illumina 10x 3' it worked well, so I think this problem SHOULD be related with library prep things and finding appropriate chemistry (if this data is not from 10x) will help my problem... Any help will be appreciated!

+) input datas format are like as below;

+) it seems all the runs are from individual sample (all of the runs had different sample code) but if the runs originate from the same brain region, I grouped them together by my mind and used as input for cellranger

scRNA-seq cellranger FASTQ • 2.1k views
Entering edit mode
8 months ago
GenoMax 144k

I picked a random sample that had brain structure "PONS" from all samples in this bioproject accession. Looking at the description for the sample on GEO this is not 10x data. It seems to be some custom prep. So it is not surprising that you are getting an error trying to process the data as 10x.

Check the data processing information on this page:

Entering edit mode

Hello, first thank you so much for your help! But I got some more questions:

  • How did you know that this data was non-10x-data by just looking at the data processing part? It says;

For data from Illumina, Illumina CASAVA version 1.8 were used to the basecalling. Read 2 was used to obtain the cell barcodes to further split the reads according to the cell IDs(barcode) and the same time recorded the UMI sequences. Read 1 was picked in each cell and these raw reads were trimmed to remove TSO or polyA sequence. Adaptor contamination and low-quality reads were discarded from the trimmed Read1 raw data. TopHat(version 2.0.14) with default settings were used for sequence alignment and uniquely mapped reads were kept. Uniquely mapped reads were counted by HTSeq package (version 0.6.0), in which reads with same UMI info were assigned as “1”. Finally, for each given individual cell, cell-gene matrices with UMI counts values were generated.

I think the related part is before 'TopHat(version 2.0.14)...' and as far as I know there are cell barcodes in Read 2 and TSO and poly(A) sequence in Read 1 also in Illumina 10x 3' library prep method, what part in here made you to think that this data was non-10x-data?

  • If this is not a 10x-data then I think I should use alternative mapping tool for this data (maybe STARsolo or something) cause it seems cellranger does not support non-10x-data as an input. And what makes me worried is that I have to merge two different datasets (one is this, and the other is from 10x 3' v3 mapped by cellranger). Then do you think merging two different datasets that are mapped to ref genome with different method will interfere interpreting biological meanings?


Entering edit mode

With 10x technology, read 1 consists of cellbarcodes + UMI (26 or 28 bp) and Read 2 is RNA read. In this case they are clearly saying that cellbarcodes and UMI are in read 2. You will need to see what the structure of read 2 is in this data to discern where the cellbarcodes and UMI's are.The methods section of the publication associated with data should have relevant details.

Since they used TopHat you can use STAR instead. But if you are able to figure out where the barcodes+UMI are then you could use STARsolo.

I have to merge two different datasets

That may be tricky at best, especially if different parts of RNA are sampled (10x mainly does 3-'end for scRNAseq). There will be batch effects that you would likely not be able to address.

Entering edit mode

With 10x technology, read 1 consists of cellbarcodes + UMI (26 or 28 bp) and Read 2 is RNA read. In this case they are clearly saying that cellbarcodes and UMI are in read 2.

Oh, then I was confused, thank you for clarification.

We enriched the 3′ end of mRNAs using streptavidin-modified beads (Thermo Fisher Scientific, 65002) and did further library construction using the KAPA Hyper Prep Kits for Illumina (KAPA, KK8506). Each cell was sequenced for 0.5-giga bases of 150-bp paired-end reads on an Illumina platform with Hiseq4000.

I checked the methods part of the paper and it seems this data was also generated by 3' end sequencing, then maybe I can first try doing downstream analysis after merging two datasets and if it fails I can try the same thing without merging... This dataset kinda annoys me.

Anyway thank you so much your help! Your answers really helped me a lot!


Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6