Dear all,
I have performed a scRNAseq experiment following FLASHseq protocol on NextSeq500 using High Output flowcell (paired-end 150bp, dual combinatorial indexes). The QC results are not quite good, as it seems that most of the cells don't have associated reads, and the small portion (18/384) that has associated reads are not that abundant. There is also a rather big "undetermined" portion of approx 32 million reads.
The library was checked on BioAnalyzer and Qubit before loading and it had the expected appearance suggested in the protocol. The same is also for each cell contained in the library, QC results were good and they were then indexed in the final library.
I was wondering where the problem could be generated: is it something that is related to the library prep and how could I address it (loaded quantity maybe?) or could it be something related to demultiplexing (we used bcl2fastq) that can be adjusted better?
Thank you for your answers
This question is going to be difficult to answer since we only have your description to go by and can't see the actual data. Is this the first time you are following this protocol? scRNAseq can be tricky and requires some experience.
I will assume that you have looked at the
Demux*
files in theUnaligned/Stats
directory. There should be lane specific files if you used an FC with 2 "lanes" on NexrSeq. That will show you top barcodes that were seen by the sequencer but do not match the SampleSheet you provided.If you would still like to see what other indexes are there in "Undetermined" files then you can use this piece of code to get at that information: Demultiplexing reads with index present in the labels
If sequencing stats look fine and the index combinations don'r make sense then the problem is unfortunately somewhere upstream with the experimental part.
Thank you, we will check this options. However, we tried to map using STAR and % of uniquely mapped reads is 22.8% in the best case, whereas a lot of cells map only about 2-4%. Most of the reads are classified as "unmapped: too short", but it is unclear since the average read length for all samples is in range of 130-150 bp.
I would highlight that all cells show perfect allignement with index sequences, so what could be the reason for such low read number and then also low percentage of mapping to reference genome?
Is it something that can be addressed in the libary prep?
That just means that STAR was not able to align the read. Does not help you much but the reads may not actually be short. Take some of the reads that are not aligning and check by BLAST to see if they align to the right genome. You may have DNA contamination (or at worse some other contamination) in your samples.
Like I said above if everything checks out on sequencing end then the problem is likely with the experiment/libraries.