Tutorial:STARsolo config for 10x Chromium v1, v2, v3
0
11
Entering edit mode
3.6 years ago

Instigated due to another question: RNA-Seq Cell Barcode Whitelist 10X

I am adding this for the benefit of others, as there is no other resource where the following information is clearly stated, from what I have found.

These are useful for STARsolo parameter configurations when re-aligning 10X Chromium FASTQs.

10x v1

  • Whitelist, 737K-april-2014_rc.txt
  • CB length, 14
  • UMI start, 15
  • UMI length, 10 (courtesy ATpoint)

10X v2

  • Whitelist, 737K-august-2016.txt
  • CB length, 16
  • UMI start, 17
  • UMI length, 10

10x v3

  • Whitelist, 3M-Feb_2018_V3.txt
  • CB length, 16
  • UMI start, 17
  • UMI length, 12

As per ATpoint, whitelists are available from: https://github.com/10XGenomics/cellranger/tree/master/lib/python/cellranger/barcodes

These are implemented in STAR as:

  STAR \
    ...
    --soloCBwhitelist [whitelist] \
    --soloCBlen [CB length] \
    --soloUMIstart [UMI start] \
    --soloUMIlen [UMI length] \
    ...

Technically, STARsolo can also be run with --soloCBwhitelist None if no whitelist is provided.

Kevin

Chromium 10x STARsolo STAR • 6.1k views
ADD COMMENT
1
Entering edit mode

Here is a link to the old v1 chemistry datasheet. If I get that correctly the "UMI" as it is called today was 10bp and called a "randomer" back in the day.

ADD REPLY
1
Entering edit mode

Old post but I'm hoping someone can help. I was sent some 10X v2 data. The FASTQ for Read 1 is a full 150 bp. so STARsolo tells me the barcode sequence is too long. The read 1 all look like this:

GCTGAATAGGTGCTAGCGAACTGCGGTTTGTAGATTAAGAATGAAAAAAAAAAAAAAAAAAAAAAAA...
ADD REPLY
0
Entering edit mode

You will need to trim this read down to suitable length if STARsolo does not like 150 bp read. As you see, remainder of the read is just polyA tail. You can hardtrim reads using reformat.sh from BBMap or any other trimming tool.

ADD REPLY
0
Entering edit mode

I got data for 10X processed in Nextseq. It uses Chromium Next GEM Single Cell 3' GEM, Library Kit v3.1 but has 27 bases in R1 reads- (CCTTTCAGTCGCATCGGAACCCACTGC) White list (Whitelist, 3M-Feb_2018_V3.txt) AAACCCAAGAAACACT I also tried version 2 and without Whitelist, but still, it will not work. Any suggestion that what may be wrong in barcode specifications and read length?

ADD REPLY
0
Entering edit mode

Looks like your sequencing facility sequenced R1 one base pair-short. I guess you can try specifying --soloBarcodeReadLength 27.

ADD REPLY
0
Entering edit mode

Unfortunately https://singlecell.usegalaxy.eu/ it will not let me fine tune such parameters.

ADD REPLY
0
Entering edit mode

Looks like you will have to run this on the command line in that case.

ADD REPLY

Login before adding your answer.

Traffic: 1619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6