Question: How to interpret and trim this plots?
0
gravatar for eyonesi
28 days ago by
eyonesi0
eyonesi0 wrote:

hello everyone I am running denovo rnaseq experiment at quality control step. I can’t interpret and trim two plot duplication level and GC content at output of fastqc . I have read some articles that is not recommended to remove duplicates for differential expression analysis. I don’t know that how can I trim the outputs Here is some details of the plots.

Plot of Sequence Duplication level . Percent of seqs remaining if deduplicated 48.8% Blue line show two tower : one between 9 to 50 of X axis with maximum of Y axis= 15% and second between 50 to 500 of X axis with maximum of y axis= 8% .

Plot of per Sequence GC content Red line of this plot have two peak at points : 1- X axis= 45 and Y axis= 500000 , 2: X axis= 72 and Y axis= 720000 blue line of this plot have one peak at point : X axis= 72 and Y axis= 720000

best regards

rna-seq • 130 views
ADD COMMENTlink modified 28 days ago by Michael Dondrup43k • written 28 days ago by eyonesi0
0
gravatar for Michael Dondrup
28 days ago by
Bergen, Norway
Michael Dondrup43k wrote:

For RNA-seq you can ignore all output of FastQC except for Per base quality and Adapter content.

ADD COMMENTlink written 28 days ago by Michael Dondrup43k

thanks for your answer. Can I ignore all output even if it shows two peaks in gc content? with regards

ADD REPLYlink written 27 days ago by eyonesi0

That is hard to judge without more information and the picture. I have just looked at some of our data, and most have a single bell-shaped distribution of GC with the mean very close to the GC of all exons in the organism. If you have two peaks, you could either have contamination from a different organism, or possibly some reads from this organism have very different GC, could be ribosomal RNA for example. Certainly, you need to understand what you are dealing with, for that you can make a plot of the distribution of the GC content for all genes, including ribosomal RNA and compare the distributions. In the end, however the question is if there is anything you can or need to do as a result from your findings. You should continue with your analysis and possibly check for contamination in addition, but that can only be done when taking all the data forward, by making either (pseudo-) alignments or assembly.

ADD REPLYlink modified 27 days ago • written 27 days ago by Michael Dondrup43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1365 users visited in the last hour