Question

STAR solo paramters for 10X chromium single cell rna seq data alignment with R1 reads 150bp long

1

Entering edit mode

2.1 years ago

bp22 ▴ 80

Hi all,

I have some 10x v3 single cell rna seq fastq files that I am trying to map to human genome using STAR aligner. However, I am getting the following error and hope that some of you can help:

EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 150 not equal to expected 28

I have checked the FASTQ file for Read 1 and see that it is full 150bp. For instance one of the reads is:

"Read ID=@A00551:244:HFHKLDSX2:1:1101:1488:1063 1 N 0 Sequence=GAGGCAAGTGGCAGATCGTTTCAACATTGTTCCTGCGCAACACAGAATAGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

As per STAR, the solution hint provided is the following:

"SOLUTION: make sure that the barcode read is the last file in --readFilesIn , and check that it has the correct formatting If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength"

My question is (a). Do I need to trim the R1 reads to a length of 28bp before alignment? or (b) Should I just specify the --soloBarcodeReadLength option in STAR to be 150?

In CellRanger the length of reads is R1 is not problematic as there are options to specify trimming or the tool just ignores the rest of the reads.

Any help is appreciated.

Thank you.

10X STARsolo alignment chromium STAR • 3.1k views

ADD COMMENT • link 2.1 years ago by bp22 ▴ 80

0

Entering edit mode

What command are you running? Your probably didn't set the barcode position and length arguments.

ADD REPLY • link 2.1 years ago by rpolicastro 13k

0

Entering edit mode

Hi rpolicastro

I am running the following:

STAR --outSAMattributes All --outSAMtype BAM Unsorted --quantMode GeneCounts --readFilesCommand gunzip -c --runThreadN $NCPU --sjdbGTFfile $GTFFILE --outReadsUnmapped Fastx --outMultimapperOrder Random --genomeDir $GENOMEDIR --readFilesIn ${INPUTDIR}/${OUTPREFIX}_L001_R2_001.fastq.gz ${INPUTDIR}/${OUTPREFIX}_L001_R1_001.fastq.gz --outFileNamePrefix $OUTPREFIX --soloType CB_UMI_Simple --soloCBwhitelist $WHITELIST --soloUMIlen 12 --soloCBlen 16 --soloUMIstart 17

Thank you.

ADD REPLY • link 2.1 years ago by bp22 ▴ 80

0

Entering edit mode

See: STARsolo config for 10x Chromium v1, v2, v3

A poster indicated that setting the proper length did not seem to work so you can simply hard trim read 1 to correct length.

ADD REPLY • link 2.1 years ago by GenoMax 142k

0

Entering edit mode

Hi GenoMax

Thanks! I tried with putting the --soloBarcodeReadLength option in STAR to be 150 and there was no problem then. Mapping was completed sccessfully.

The final log out are as follows and indicate a good percentage of uniquely mapped reads.

                      Number of input reads |       81652310
                  Average input read length |       150
                                UNIQUE READS:
               Uniquely mapped reads number |       73571734
                    Uniquely mapped reads % |       90.10%
                      Average mapped length |       146.91
                   Number of splices: Total |       29594463
        Number of splices: Annotated (sjdb) |       29052571
                   Number of splices: GT/AG |       29204232
                   Number of splices: GC/AG |       99944
                   Number of splices: AT/AC |       9797
           Number of splices: Non-canonical |       280490
                  Mismatch rate per base, % |       0.38%
                     Deletion rate per base |       0.01%
                    Deletion average length |       1.62
                    Insertion rate per base |       0.01%
                   Insertion average length |       1.59
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       3534767
         % of reads mapped to multiple loci |       4.33%
    Number of reads mapped to too many loci |       30684
         % of reads mapped to too many loci |       0.04%
                              UNMAPPED READS:

Moreover the solo output summary is

enter image description here which seems to be comparable to a CellRanger output summary for a sample that was mapped earlier. The only significant difference that I see is in the Q30 Bases in Barcode which is low in the STAR solo run as compared to the Cell Ranger (~97.5%).

Can you please suggest some trimming tool that is appropriate in case I want to hard trim R1 reads?

Thank you.

ADD REPLY • link 2.1 years ago by bp22 ▴ 80

0

Entering edit mode

Hmm. You obviously don't have 150 bp barcodes but if you did get good alignments then I suppose STAR behaved like cellranger in ignoring the rest of the read.

Can you please suggest some trimming tool that is appropriate in case I want to hard trim R1 reads?

reformat.sh from BBMap suite will work. Use forcetrimright=NN option to remove number of bases you want.

ADD REPLY • link 2.1 years ago by GenoMax 142k

1

Entering edit mode

Thanks for your suggestion. Yes, it is true that the barcodes for v3 is not 150bp and I think this is why the value for 'Q30 Bases in Barcode' statistic is low in the STAR solo run. FastQC on R1 showed that the quality was poor after 28bp. Also, I am not sure how this will impact the counts matrix and working with Seurat for downstream analysis.

ADD REPLY • link 2.1 years ago by bp22 ▴ 80

0

Entering edit mode

I think the problem might have just been that they forgot to specify the cell barcode start position argument --soloCBstart. You theoretically (and practically in my experience) shouldn't have to trim the R1 read.

ADD REPLY • link 2.1 years ago by rpolicastro 13k