Question: Interpretation of FASTQC results - do I need to trim my sequence?
1
gravatar for sallyanndunn18
3.6 years ago by
sallyanndunn1810 wrote:

I am quality checking a gene sequence using FASTQC.

It has given me warnings about the sequence duplication (44%), per base sequence content and kier content.

Should I trim this sequence using trim galore or cut adapt to rectify these problems??

I tried this using q-30 and removing the adapter sequence AGATCGGAAGAGC however this returned my paired sequences with even more warnings with fails on per base sequence content and GC content.

So I am not sure how to proceed as whilst fixing the warnings of the oriingal sequence I have created new problems by trimming it.

Any help/advice would be appreciated :)

ADD COMMENTlink modified 3.6 years ago by chen1.9k • written 3.6 years ago by sallyanndunn1810
4
gravatar for genomax
3.6 years ago by
genomax73k
United States
genomax73k wrote:

Take a look at this set of blog posts by Dr. Simon Andrews (author of FastQC) and see if they answers some of your questions. This post may be directly relevant in this case.
Please don't trim data based on Q-scores unless you have some really bad data (Q10 or less). Otherwise you would be throwing away good data.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by genomax73k
1
gravatar for chen
3.6 years ago by
chen1.9k
OpenGene
chen1.9k wrote:

Duplication of 44% is not very high if you were doing some kind of deep sequencing (>200x).

And trimming of the read's head/tail is needed for the cases you want to reduce false positive mutations (especially low frequency somatic mutations), try https://github.com/OpenGene/after, which can do automatic Filtering, Trimming and Error Removing

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by chen1.9k
0
gravatar for sallyanndunn18
3.6 years ago by
sallyanndunn1810 wrote:

Thank you very much! Is it a problem that by sequence duplication is very high (44%)?

ADD COMMENTlink written 3.6 years ago by sallyanndunn1810

Does the following from your original post mean that you are looking at just one gene (amplicon sequencing)?

I am quality checking a gene sequence using FASTQC


If that is the case then you would expect a lot of duplication. Have you scanned your data with a trimming program to ensure that there is no adapter contamination?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by genomax73k

Sorry I meant to say a genome sequence.

TGCTG is a sequence identified in the 'K mer content' with count 752200.

ADD REPLYlink written 3.6 years ago by sallyanndunn1810
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 778 users visited in the last hour