Question: How to interpret and trim this plots?
gravatar for eyonesi
3.3 years ago by
eyonesi50 wrote:

hello everyone I am running denovo rnaseq experiment at quality control step. I can’t interpret and trim two plot duplication level and GC content at output of fastqc . I have read some articles that is not recommended to remove duplicates for differential expression analysis. I don’t know that how can I trim the outputs Here is some details of the plots.

Plot of Sequence Duplication level . Percent of seqs remaining if deduplicated 48.8% Blue line show two tower : one between 9 to 50 of X axis with maximum of Y axis= 15% and second between 50 to 500 of X axis with maximum of y axis= 8% .

Plot of per Sequence GC content Red line of this plot have two peak at points : 1- X axis= 45 and Y axis= 500000 , 2: X axis= 72 and Y axis= 720000 blue line of this plot have one peak at point : X axis= 72 and Y axis= 720000

best regards

rna-seq • 1.1k views
ADD COMMENTlink modified 3.3 years ago by Michael Dondrup48k • written 3.3 years ago by eyonesi50
gravatar for Michael Dondrup
3.3 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

For RNA-seq you can ignore all output of FastQC except for Per base quality and Adapter content.

ADD COMMENTlink written 3.3 years ago by Michael Dondrup48k

thanks for your answer. Can I ignore all output even if it shows two peaks in gc content? with regards

ADD REPLYlink written 3.3 years ago by eyonesi50

That is hard to judge without more information and the picture. I have just looked at some of our data, and most have a single bell-shaped distribution of GC with the mean very close to the GC of all exons in the organism. If you have two peaks, you could either have contamination from a different organism, or possibly some reads from this organism have very different GC, could be ribosomal RNA for example. Certainly, you need to understand what you are dealing with, for that you can make a plot of the distribution of the GC content for all genes, including ribosomal RNA and compare the distributions. In the end, however the question is if there is anything you can or need to do as a result from your findings. You should continue with your analysis and possibly check for contamination in addition, but that can only be done when taking all the data forward, by making either (pseudo-) alignments or assembly.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Michael Dondrup48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1740 users visited in the last hour