MultiQC report - unique reads from sequence counts
0
0
Entering edit mode
12 days ago
Rozita ▴ 40

Hello,

I know that we shouldn't read so much from a fastQC/multiQC report, but I've got CUT&RUN libraries that were sequenced and they had some adaptor contamination, which we thought would be an acceptable level to proceed with as another round of clean up would have led to a lot of sample loss.

When I look at the fastQC/multiQC report, I can see a few things that are making me question whether it would be worth proceeding with downstream analysis of this set of samples. One of them is the sequence counts. When looking at the sequence counts, and checking the % of unique reads, they're in the rage of 50-70%, is that an acceptable range to work with? The second image shows the number of reads rather than % of reads if that's more helpful

When looking at the sequence counts, and checking the % of unique reads, they're in the rage of 50-70%, is that an acceptable range to work with?

This image shows the number of reads rather than % of reads if that's more helpful.

Thank you.

fastQC unique reads multiQC • 472 views
ADD COMMENT
0
Entering edit mode

Any time you are using a technique that enriches for a certain regions of the genome you are going to end up with duplicate reads since these specific areas are going to be enriched.

they had some adaptor contamination, which we thought would be an acceptable level to proceed with as another round of clean up would have led to a lot of sample loss.

Another round of clean up of what kind? If you have adapter sequences then they are useless as far as your experiment goes, so no harm in losing them. If you are referring to a "clean up" (e.g. bead wash on experimental side) that is a separate issue (not really bioinformatics).

ADD REPLY
0
Entering edit mode

Thank you. I meant a clean up with beads, which I definitely understand is a wet-lab step. I meant that another round of beads clean-up would have led to more sample loss, and whether that would eventually affect my final data.

ADD REPLY
0
Entering edit mode

The main concern with adapter contamination is that it will eat up your sequencing costs. It should not really affect your library quality, and computationally you will generally ignore them. It is possible too much adapter contamination could be a symptom of an issue in library prep, for example very low DNA input amount that affects library complexity. But again, adapter dimers themselves should not affect your library quality, just wasted money/space/reads during sequencing.

In general, I find fastqc can overestimate duplicates. While this number is high, I would not be too worried. Also it seems the samples with highest duplicate counts is IgG, which is somewhat expected.

The main problem with duplicates would be low library complexity. At this stage, nothing you do with the library can really change that. You can perform QC such as a fingerprint plot and fraction reads in peaks to get an idea of complexity. The ENCODE ChIP-seq pipeline also has some metrics to estimate library complexity.

I think, overall, the main concern is if the samples you intend to quantitatively compare have significantly different QC metrics. For example, if between your siSCR and siBRCA2 "Pan" samples, there was a large difference in duplicate rate and GC content bias, then that may indicate significant processing defects that could affect signal or library complexity (although sometimes this can reflect large scale biological changes due to your perturbation).

Also, for determining CUT&RUN quality and success, I think it is very helpful to view signal tracks on the genome browser, in addition to enrichment at expected regions (e.g. promoters for H3K4me3) or motifs if looking at TFs.

ADD REPLY

Login before adding your answer.

Traffic: 2297 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6