FastQC duplicates questions
0
1
Entering edit mode
8 months ago
Beth ▴ 10

Hi all,

I'm trying to analyze a quality of some open RNA-seq data from the granulosa cells, and I'm dealing with some weird QC problems in 2 separate study data.

Study 1 : a lot of duplicate reads with the polyT overrepresented sequence (sample treatment: RNA-seq libraries were prepared using the KAPA Stranded RNA-Seq Library Preparation Kit from KAPA®, sequencing - Illumina HiSeq 2000, paired-end). enter image description here

Study 2: similar problem but a LOT of overrepresented sequences blasting on rRNAs mostly (sample treatment: before the construction of an RNA-seq library, rRNA was removed from the total RNA samples using the RiboMinus Eukaryote Kit, the resulting RNA-seq library was quantified using an Agilent 2100 Bioanalyzer and was run on the HiSeq PE150 platform (Illumina, CA, USA) for paired-end 150 RNA sequencing).

enter image description here

enter image description here

My main questions here are:

  1. Does it look like a problem with the rRNA depletion process?
  2. Can I use this data in the analysis (for example, after rRNA reads filtering) or I should discard it?

I've encountered different opinions about filtering rRNA reads (but I still hold the view that it can bias the result of expression measurement), but the authors of these datasets themselves filter rRNA reads as part of their data processing.

You can access the full FastQC reports here: my FastQC reports

Thank you in advance!

FastQC RNA-seq • 397 views
ADD COMMENT
2
Entering edit mode

Have you seen this blog post from authors of FastQC: https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/

Instead of getting bogged down with QC details you may want to make a note of this observation and proceed on with rest of your analysis. If this is public data you don't have much control over what was done/reported. If you are planning to do any meta analysis them add relevant metadata columns in your PCA plots etc if you are planning to try and compare/combine data from multiple kits.

ADD REPLY

Login before adding your answer.

Traffic: 2553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6