Question: How to interpret fast or multiQC reports of easy HiC or HiC data
gravatar for msimmer92
19 months ago by
msimmer92260 wrote:

Hello, I have got fastqs from an easy-HiC sequencing experiment: the samples are embrionic stem cells and neurons from mice. We used UMIs (unique molecular identifiers) to tag the reads and the sequencing was paired-end.

I am new to the HiC field and wondering how exactly to interpret the fastQC or multiQC reports to know that everything went fine in the sequencing and continue with the analysis. The per base and per sequence quality was very good for all samples, but for other fields such as the general statistics (duplication levels) and GC content I thought it was better to consult.

1) is that level of duplication expected? I discussed it with a colleague and we thought it could be due to the fact that we did not use the UMIs to sort the reads yet.. so maybe a lot of reads are taken as duplicates at this point. Does this make sense?

2) For the GC content plot, please ignore the three red peaks that are at the top (they are from the three undetermined sequences file). My question is mostly about the curves that I am pointing at with a sky-blue arrow. Some ESCs and neurons files have a bell-curved shape (arrow at the left) whereas others have a strange, squared-like shape (arrow at the right). I am wondering if there is something wrong with the files that have this squared distribution or if this is somewhat expected in this kind of experiments.

Thank you for your time! Best regards.

enter image description here

enter image description here

fastqc hic easyhic • 757 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by msimmer92260

How to add images to a Biostars post. I made the changes this time.

ADD REPLYlink written 19 months ago by ATpoint41k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1213 users visited in the last hour