Shallow sequencing is not a random subsample of deep sequencing
0
1
Entering edit mode
4 weeks ago
jockbanan ▴ 440

Dear community,

Over the past years, I have collected several scRNA-seq datasets that have been sequenced multiple times, i.e. the same DNA sample with the final library was sequenced in multiple separate Illumina runs.

One interesting phenomenon that I can observe over and over is that a shallow sequencing run is not a random subsample of a deep seqeuencing run. What I mean is, say, I have two sequencing runs of the same DNA library, one with 10 000 reads/cell, another with 40 000 reads/cell. If I take the 40 000 reads/cell dataset and randomly subsample it to 10 000 reads/cell, it will be qualitatively different from the dataset that was actually sequenced at 10 000 reads/cell. More specifically shallow sequencing seems to underestimate library complexity, i.e. the actually shallow dataset will have fewer unique cell-gene-umi combinations than the subsampled dataset. I guess this can only mean a sequencing bias, i.e. the sequencer is preferentially processing certain DNA fragments. I could identify a slight bias towards shorter fragments (fragment length estimated based on the expected position of the 3' end of a gene), but I'm not certain whether this can explain the entire extent of this phenomenon.

I was unable to find any literature on this topic. All the articles investigating optimal sequencing depth in single-cell datasets I found used random subsampling of a deeply sequenced library to simulate shallow sequencing. Are you aware of any work that would discuss this phenomenon? Ideally, I would like find some tool which would subsample a deeply-sequenced dataset in a way aware of this phenomenon, so that the subsampled dataset would resemble a true shallow-coverage dataset.

Illumina scRNA-seq • 298 views
ADD COMMENT
0
Entering edit mode

One interesting phenomenon that I can observe over and over is that a shallow sequencing run is not a random subsample of a deep seqeuencing run

Can you give us some additional details. How were the libraries stored? How far apart was the sequencing? Was it done using the same chemistry and/or sequencer?

10x support told us a while back that shallow sequencing (e.g. a MiSeq nano run with a 1M reads) was at best to be used only "qualitatively" for checking library quality.

ADD REPLY
0
Entering edit mode

I have multiple different examples - 10X and BD Rhapsody libraries, MiSeq+NovaSeq, MiSeq + NextSeq 2000, NovaSeq SP + NovaSeq S4, NovaSeq SP + NovaSeqX. Storage can be ruled out as a reason, because the deeply sequenced (=better) library was always sequenced later. So storage would have to improve the library quality :-)

ADD REPLY

Login before adding your answer.

Traffic: 2694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6