Hi all,
I am in desperate need of some help with a sequencing experiment and would be very grateful for any of your thoughts. I sequenced a 10x scRNA-seq 3’ v3.1 experiment recently on a MiSeq and got poor quality sequencing results. The median quality score of the R2 read (the RNA-derived read) was 19. However, the quality of R1 and the index read were high (median quality 36 and 35 respectively).
I reached out to the sequencing core who did the sequencing about issues with the MiSeq run. They told me, “According to our metrics this run did fairly well except for Read 2 which suggests a library issue. After base 36 A's come out to be 30-45% of the run until base 70. This run had 92% PF. 20 million passed filter reads on a V2 where we only guarantee 12 million reads. Runs that overcluster have really low %PF and the quality scores would be low for all reads not just Read 2. The thumbnail images also look good. Read 1: 97.95 %Q30 Index 1: 93.61 %Q30 Read 2: 30 %Q30”.
I then reached out to 10x for support, and got the following response: “Since the R2 is just the transcript, it is possible that something could be off with the transcript. However, typically when we see mRNA degradation, there will also be an increase in short inserts that in the sequencing read will read into the polydt or the insert does not get the TSO sequence cleaved off so there will be TSO sequences in the list of overrepresented sequences in the FastQ report. Your FastQC report did not indicate these issues. Since the FastQC report doesn't quite line up with that hypothesis, this could be a sequencing issue”.
I also reached out directly to Illumina and they told me: “I found that the drop in quality was around cycle 37 an error indicating a potential reverse read issue. The cycle 37 is the beginning of read 3, where paired end turnaround chemistry begins”. They suggested troubleshooting the MiSeq machine by power cycling and performing a system check.
One side is telling me this is a sequencing issue, and the other that this is a library issue. I need an impartial juror to help figure out what’s going on. Have any of you seen anything like this before? Could the poor quality in R2 be caused by the sequencing, or is there a problem with the library? Any advice or thoughts would be much appreciated.
Thank you for your time and help.
Just wondering, and you probably ruled this out so it's a stupid question, but is there any chance that R1 and R2 got mixed up? Do you see the drop in quality that is expected in R1 after the CB/UMI part of the read?
Hi ATpoint, not a stupid question! I'm looking for any and all possibilities and appreciate your response. Would you mind elaborating? What do you mean by mixing up R1 and R2? Do you mean bioinformatically in terms of the cellranger analysis? Or something with running the sequencing machine (I dropped the libraries off at a sequencing core facility). Also, I am unfamiliar with the expected drop in quality after the BC/UMI part of the read that you are alluding to. Could you elaborate on this as well? I've posted a few QC screenshots in another reply if those would be helpful. Many thanks for your time.
Can you tell us the number of cycles used for R1/R2 files? Generally R1 is sequenced for a lower number e.g. 28 cycles as opposed to R2 which could be up to 100 cycles.
In case of 10x libraries after CB/UMI are read (in Read 1) there is stretch of polyT (see this doc from 10x for structure of libraries). Since every cluster will have polyA, sequence becomes low nucleotide diversity (all clusters glow and thus software has trouble calling bases out, which is reflected in a drop in base quality).
Note: Based on your original description it does appear that you have short inserts. So even on R2 you are reading into that polyT stretch from other end. As you mentioned above this is a clinical sample so there is not much you are going to be able to do.
What ATpoint means is have you checked if names on your R1/R2 files were interchanged. Unlikely but just in case.
Thank you, GenoMax. The cycles for this flow cell run (MiSeq Standard - V2) were 28,8,0,91 as recommended in the 10x protocol. I have requested the raw BCL data from the sequencing core so I can recreate the fastq files and make sure there are no issues with that step. As for your point about reading into the PolyT stretch, I see why that would be problematic. Here is a screenshot of the sequence content of R2 (I think position 1 is absolute cycle number 37). There is a bit of A bias in the beginning. If there are short inserts, however, wouldn't that appear on the fragment size trace (pasted below)?
One thing I noticed about the library prep, was a very high final DNA concentration after sample indexing/amplification PCR. So maybe there were too many cycles of PCR amplification? Not sure if there is any indication of that in these diagnostic plots, however. Fortunately, I have ample remaining cDNA to repeat the NGS PCR amplification with fewer PCR cycles. Thoughts on if this would be useful?
This all is standard. There is likely no need to do the conversion yourself. Your provider should be trustable for this part, especially if they do this for a living.
Those in fact may be fragments with no inserts though I don't know if it is possible get those in 10x prep. Remember that the trace is a gross look at all library fragments. Unfortunately shorter fragments cluster more efficiently so those may be selected when forming clusters.
I can't comment on experimental part but I assume you followed 10x recommendations. If not you should definitely do that in case of a do over.
Thanks again for all of your help with this. Yes, I followed the library prep protocol exactly as written, and have done the same assay previously with good results. I think my plan moving forward will be to re-quantify and re-sequence this library on another low yield platform (iSeq or MiSeq). If the poor sequencing quality issue persists, I will then try to optimize the library prep starting from the cDNA by varying the number of PCR cycles. If that doesn't improve the situation, then it will be decision time on whether to proceed at all!
That sounds like an excellent plan considering the sample type. Let us know what happens. Good luck!