Dramatically worse R2 sequencing read quality compared with R1 quality for 10x scRNA-seq library - a sequencing issue or a library issue?
1
2
Entering edit mode
2.1 years ago
Devin ▴ 20

Hi all,

I am in desperate need of some help with a sequencing experiment and would be very grateful for any of your thoughts. I sequenced a 10x scRNA-seq 3’ v3.1 experiment recently on a MiSeq and got poor quality sequencing results. The median quality score of the R2 read (the RNA-derived read) was 19. However, the quality of R1 and the index read were high (median quality 36 and 35 respectively).

I reached out to the sequencing core who did the sequencing about issues with the MiSeq run. They told me, “According to our metrics this run did fairly well except for Read 2 which suggests a library issue. After base 36 A's come out to be 30-45% of the run until base 70. This run had 92% PF. 20 million passed filter reads on a V2 where we only guarantee 12 million reads. Runs that overcluster have really low %PF and the quality scores would be low for all reads not just Read 2. The thumbnail images also look good. Read 1: 97.95 %Q30 Index 1: 93.61 %Q30 Read 2: 30 %Q30”.

I then reached out to 10x for support, and got the following response: “Since the R2 is just the transcript, it is possible that something could be off with the transcript. However, typically when we see mRNA degradation, there will also be an increase in short inserts that in the sequencing read will read into the polydt or the insert does not get the TSO sequence cleaved off so there will be TSO sequences in the list of overrepresented sequences in the FastQ report. Your FastQC report did not indicate these issues. Since the FastQC report doesn't quite line up with that hypothesis, this could be a sequencing issue”.

I also reached out directly to Illumina and they told me: “I found that the drop in quality was around cycle 37 an error indicating a potential reverse read issue. The cycle 37 is the beginning of read 3, where paired end turnaround chemistry begins”. They suggested troubleshooting the MiSeq machine by power cycling and performing a system check.

One side is telling me this is a sequencing issue, and the other that this is a library issue. I need an impartial juror to help figure out what’s going on. Have any of you seen anything like this before? Could the poor quality in R2 be caused by the sequencing, or is there a problem with the library? Any advice or thoughts would be much appreciated.

Thank you for your time and help.

10x scRNA-seq MiSeq quality • 4.1k views
ADD COMMENT
0
Entering edit mode

Just wondering, and you probably ruled this out so it's a stupid question, but is there any chance that R1 and R2 got mixed up? Do you see the drop in quality that is expected in R1 after the CB/UMI part of the read?

ADD REPLY
0
Entering edit mode

Hi ATpoint, not a stupid question! I'm looking for any and all possibilities and appreciate your response. Would you mind elaborating? What do you mean by mixing up R1 and R2? Do you mean bioinformatically in terms of the cellranger analysis? Or something with running the sequencing machine (I dropped the libraries off at a sequencing core facility). Also, I am unfamiliar with the expected drop in quality after the BC/UMI part of the read that you are alluding to. Could you elaborate on this as well? I've posted a few QC screenshots in another reply if those would be helpful. Many thanks for your time.

ADD REPLY
0
Entering edit mode

Can you tell us the number of cycles used for R1/R2 files? Generally R1 is sequenced for a lower number e.g. 28 cycles as opposed to R2 which could be up to 100 cycles.

In case of 10x libraries after CB/UMI are read (in Read 1) there is stretch of polyT (see this doc from 10x for structure of libraries). Since every cluster will have polyA, sequence becomes low nucleotide diversity (all clusters glow and thus software has trouble calling bases out, which is reflected in a drop in base quality).

Read 2 which suggests a library issue. After base 36 A's come out to be 30-45% of the run until base 70.

Note: Based on your original description it does appear that you have short inserts. So even on R2 you are reading into that polyT stretch from other end. As you mentioned above this is a clinical sample so there is not much you are going to be able to do.

What ATpoint means is have you checked if names on your R1/R2 files were interchanged. Unlikely but just in case.

ADD REPLY
0
Entering edit mode

Thank you, GenoMax. The cycles for this flow cell run (MiSeq Standard - V2) were 28,8,0,91 as recommended in the 10x protocol. I have requested the raw BCL data from the sequencing core so I can recreate the fastq files and make sure there are no issues with that step. As for your point about reading into the PolyT stretch, I see why that would be problematic. Here is a screenshot of the sequence content of R2 (I think position 1 is absolute cycle number 37). There is a bit of A bias in the beginning. If there are short inserts, however, wouldn't that appear on the fragment size trace (pasted below)?

One thing I noticed about the library prep, was a very high final DNA concentration after sample indexing/amplification PCR. So maybe there were too many cycles of PCR amplification? Not sure if there is any indication of that in these diagnostic plots, however. Fortunately, I have ample remaining cDNA to repeat the NGS PCR amplification with fewer PCR cycles. Thoughts on if this would be useful?

R2 sequence enter image description here

ADD REPLY
0
Entering edit mode

The cycles for this flow cell run (MiSeq Standard - V2) were 28,8,0,91 as recommended in the 10x protocol. I have requested the raw BCL data from the sequencing core so I can recreate the fastq files and make sure there are no issues with that step

This all is standard. There is likely no need to do the conversion yourself. Your provider should be trustable for this part, especially if they do this for a living.

There is a bit of A bias in the beginning. If there are short inserts, however, wouldn't that appear on the fragment size trace (pasted below)?

Those in fact may be fragments with no inserts though I don't know if it is possible get those in 10x prep. Remember that the trace is a gross look at all library fragments. Unfortunately shorter fragments cluster more efficiently so those may be selected when forming clusters.

One thing I noticed about the library prep, was a very high final DNA concentration after sample indexing/amplification PCR. So maybe there were too many cycles of PCR amplification?

I can't comment on experimental part but I assume you followed 10x recommendations. If not you should definitely do that in case of a do over.

ADD REPLY
0
Entering edit mode

Thanks again for all of your help with this. Yes, I followed the library prep protocol exactly as written, and have done the same assay previously with good results. I think my plan moving forward will be to re-quantify and re-sequence this library on another low yield platform (iSeq or MiSeq). If the poor sequencing quality issue persists, I will then try to optimize the library prep starting from the cDNA by varying the number of PCR cycles. If that doesn't improve the situation, then it will be decision time on whether to proceed at all!

ADD REPLY
0
Entering edit mode

That sounds like an excellent plan considering the sample type. Let us know what happens. Good luck!

ADD REPLY
1
Entering edit mode
2.1 years ago
GenoMax 141k

Since you have already been in touch with both vendors there is not much we can probably add. I assume you shared the fastqc reports (which probably don't make much sense for 10x) with them? Can you perhaps add images here?

Only way to settle this conclusively would be to re-sequence your libraries on a different MiSeq (if you have access to one), or on the same MiSeq again (assuming it is performing well for other samples and there is nothing otherwise wrong with that sequencer).

If this was originally a sequencing issue, then data from new run should in theory will look good. If it looks the same as first run, then you need to bite the bullet and accept that you have bad libraries and will need to re-do the experiment.

PS: Have you tried to analyze the data in spite of the noted issues.

ADD COMMENT
0
Entering edit mode

Hi GenoMax, thank you for your thoughts on this. I really appreciate it. I've pasted a few screenshots here: FASTQC on R1 and R2, the cellranger output, and a few representative thumbnail images from cycles 1, 2, 37, and 38 of the sequencing. The sad reality for me is that these libraries were from precious clinical samples, and I can't simply re-do the experiments ...

I did try analyzing the data in spite of the noted issues, and the mapping quality is poor (which affects all downstream analyses). I presume I can play around with mapping parameters to try and recover more reads. The reads that do map confidently appear reasonably good.

I don't have access to a sequencer, but can resubmit the same library pool to the core facility that ran the sequencing. As you suggest, that would tell us if the issue is intrinsic to the library itself. I am going to look around for another sequencer.

Also, I have never looked at "thumbnail images" before, but naively, the clusters look good for the first cycles, then at cycle 37 -- the start of the reverse read in this case -- everything looks blurry, smudgy, and out-of-focus. Does this tell us anything?

R1 quality R2 quality cellranger cycle 1 thumbnail cycle 2 thumbnail cycle 37 thumbnail cycle 38 thumbnail

ADD REPLY
1
Entering edit mode

We ran into a similar-looking problem to this recently, with a MiSeq library with R1 and I1 sequencing fine but R2 not (just with a different protocol for us-- rep-seq from cDNA with a custom barcoding strategy). Illumina wasn't much help for us either. I was also trying to puzzle out what was different for R2, and wondered about the imaging results shown in our run's thumbnails too, but also have no experience eyeballing these. But, I'd be curious to hear anyone's thoughts on this part.

For our problem, the best guess I could come up with was that something was interfering with the proper binding of the reverse read primer. (In particular I thought that because a test run with a little bit of this library plus a larger portion of different known-good library from the same protocol ruined the reverse read for that whole run too, even the samples that previously sequenced fine.) But in the end we just re-made the library and re-sequenced and it worked. I'd also be curious to hear if re-doing the 10x protocol from cDNA works for your case or if you find any other clues!

ADD REPLY
0
Entering edit mode

But in the end we just re-made the library and re-sequenced and it worked.

That is puzzling. If there was something interfering with primer binding then it should happen consistently.

There is always that rare instances when sequencing fails due to something on the instrument being borderline out of spec/about to fail.

ADD REPLY
0
Entering edit mode

In a way, it was consistent, at least in that having just a small proportion of material from this library included in a run along with other samples made all reverse reads fail, even from unrelated samples. That was true a few times over for this library, and yet other runs before and after did just fine-- including this one once the material was re-amplified and prepped from scratch. All of that put together is what made me think something present in the one prepared library was screwing up R2 primer binding... but I'm just at a loss as to what the mysterious "something" could have been, or what would have gone differently that one time during prep.

Thanks for your thoughts.

ADD REPLY

Login before adding your answer.

Traffic: 3023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6