SmartSeq2 with STARsolo gives only one cell
12 months ago
Dataminer ★ 2.7k

Dear Community,

I am trying to align single cell data from SmartSeq platform using STARsolo. The data is a paired end and has FASTQs of 19GB and 21Gb each. It looks like following

@A00877:307:H5H77DSXY:2:1101:1325:1000 1:N:0:TGACCAAT
GNAGGGAGACGTCTACATCTGCCAAGTGGAGCACACCAGCCTGGACAGTCCTGTCACCGTGGAGTGGAAGGCACAGTCTGATTCTGCCCGGAGTAAGACATTGACGGGAGCTGGGGGCTTCATGCTGGGGCTCATCATCTGTGGAGTGGG
+
F#FFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFF,FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFF:F:F
@A00877:307:H5H77DSXY:2:1101:1380:1000 1:N:0:TGACCAAT
TNCTAGCAGCATTGGCCTTGGCAAGTCACTGGTAACTGTTTTCTGTAAAGCAGAGGTTGCCCACTTCATTAGACTGTAAGAACTGAATGAGAAAAGAGTAGGAGAGTACTCTGTAAACACAAGTGATAGGGAAGTTACCATCACCACTCC
+
F#FF:FFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFF
@A00877:307:H5H77DSXY:2:1101:1524:1000 1:N:0:TGACCAAT
GNGATGGGTCTTGCTATGTTGCACAGGCTGGTCTTGAACTCCTGGATTTAAGTGATCTTTTTACTCTAAAATGTAATCTAAATAATAACAAAATAAATATTGAGCTGAAGAAAAGAAAAAGGAAACAGTGATTTTCATGTCTGCTATGTG


@A00877:307:H5H77DSXY:2:1101:1325:1000 2:N:0:TGACCAAT
TGGCTTCATGCAGGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTGGTGTTTTTTATTTTTTGTGGTAAAAATAAGGGAAAAAGGTTGTAGTCAAAGTGTTAGTTAAAGTGGATTGATAAAAAAAGCAAAAATTTATAAAATAAGATAATAG
+
FFFFFFFFFFFFFFF:F,F:,FF::FFF,F:FFF:FF:F:FF,F:F,FFF::F:FF::F,FFF,,,,,F,,FF,:F,F,,FF,:::,FF,F,,F,:,,,:,,,F,,,F,,,FF,,,FF:F:,F,FF,,,,,,:F,:,FF,:,,,,,,,:,
@A00877:307:H5H77DSXY:2:1101:1380:1000 2:N:0:TGACCAAT
GATAGACAGTGACGCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAGTTTTGGGGTTGGTAACTAACATAAAAATTGTTTTTAAATTGTAATAAACAAAAATTTTAAAATAAAATAATTACATTAAAATTAATGTGCCAACCAATGGTT
+
FFFFFFFFFFFFFFFFF:FFFFF,FFFF:,FFFFFFFFF:FFF,,F,,,,,,:F,,,,,,,F:,:,,,,,F,,,,:F:,FFF:,,,,:,,,,,,F,F,,,,,F,FF,,F,:,,,:,,,::,,,:,FF,F,,,,,,F:,:F,,,,,,,,,F
@A00877:307:H5H77DSXY:2:1101:1524:1000 2:N:0:TGACCAAT
GGACTATATTTACATTGTGGCTTTGCCATTTTCTAGATTTTTTTTACTTTGGACAAATTATTTAAACTCTTTGAACCTCATTGTTCTCATCTGTGCAGATGATGCTCACTTCAGAGAAGATGACGCACATACAACACATTAAACCTAGTG


I am using STARsolo to get the alignment done and I am using following command

STAR --runThreadN 16 --genomeDir ~/HumanGenomes/ --readFilesCommand zcat --readFilesIn Read1.fq.gz Read2.fq.gz --soloType SmartSeq  --outSAMtype BAM Unsorted --outBAMcompression -1 --soloUMIdedup Exact --outSAMattrRGline ID:sample1


The program executes without error but I get only one cell instead of few thousand. I am new to both SmartSeq and STARsolo, so I might have missed something, pardon for my ignorance.

Thank you

scRNA-seq SmartSeq STARsolo
10 months ago

Hi Dataminer,

I've not worked with Smart-seq scRNA-seq before, but from what i can gather from the documentation you would require separate fastq files for each cell, combined with a manifest file that lists the barcode sample/cell type combinations.

Now, in the PE reads that you've supplied there are barcodes listed, you might need to split these files by barcode. several methods are possible, see e.g. the answer from finswimmer in this thread Split fastq according to barcodes

Let me know if this worked.

Kind regards, Thomas

Yes, that is correct. Since the SmartSeq protocol is plate-based and not cellular barcode-based every fastq file pair is a cell. Check whether you have different index sequences in the fastq, in the above example it would be TGACCAAT as suggested above.