Question: very confused with GC bias
0
gravatar for 9521ljh
11 months ago by
9521ljh10
9521ljh10 wrote:

I have fastq files and find Per sequence GC content is not well shaped. Therefore I think it is contaminated.

enter image description here

But is this failure of GC content means GC bias?

Because I think GC bias is related with coverage and depth of read (after mapping problem)

but above picture is not mapped, just fastq file.

am i right think that GC content is difference with GC bias?

fastqc • 977 views
ADD COMMENTlink modified 10 months ago by chen1.9k • written 11 months ago by 9521ljh10
1

Hi, Your result seems find your distibution is closer with the theorical :)

ADD REPLYlink written 11 months ago by Titus910

is this from raw sample data or did you process it already?

ADD REPLYlink written 11 months ago by lieven.sterck7.2k

it is raw fastq file.

ADD REPLYlink written 11 months ago by 9521ljh10
5
gravatar for Friederike
10 months ago by
Friederike5.4k
United States
Friederike5.4k wrote:

But is this Failure of GC content means GC bias?

What you see is that you have more reads with a GC content of greater than 50% than what FastQC would expect given a normal distribution based on the mode of your reads' GC content. This may be indicative of GC bias, but it doesn't have to be, especially if you're not too interested in quantiative measures down the road. Keep calm and carry on and just keep this in the back of your mind before drawing strong conclusions, e.g. about interesting enrichments seen for regions with 50-60% GC content.

because i think GC bias is related with coverage and depth of read(after mapping problem)

The GC content of each read can be determined irrespective of its location in the genome; after all, you only need to tally the types of bases you've sequenced, which is exactly the type of information that's stored in a fastq file.

But you are right insofar as that FastQC's assumption about what a uniform sampling of your organism's genome should look like might be incorrect.

am i right think that GC content is difference with GC bias?

GC content simply describes the numbers of G's and C's that you sequence in relation to the numbers of A's and T's. GC bias is typically used to describe the fact that the enzymes and conditions used for PCR amplification tend to more efficiently amplify reads with modest to medium-high GC content. There will always be some sort of GC bias in Illumina-based sequencing (the reference by Terry Speed and Benjamin Hochberg that Ranan pointed to is an enlightening read in that regard); it mostly becomes an issue if you are trying to compare the read numbers of different samples where one sample (type) had only mild GC bias while the other one shows dramatic GC bias.

ADD COMMENTlink modified 10 months ago • written 10 months ago by Friederike5.4k
3
gravatar for lieven.sterck
11 months ago by
lieven.sterck7.2k
VIB, Ghent, Belgium
lieven.sterck7.2k wrote:

this is nothing to worry about. It simply shows the GC content of your read data. I would not say it deviates severely from the expected curve. It could be perhaps be due to the organisms you work on. Moreover, FastQC is very strict on its evaluation.

Here is an interesting link about all this: QCfail

What I am a little surprised about is that you all have green checks in the overview, I've seen this only very rarely :/

ADD COMMENTlink modified 11 months ago • written 11 months ago by lieven.sterck7.2k
1
gravatar for Ranan Jyoti Sarma
11 months ago by
Mizoram Univesity
Ranan Jyoti Sarma50 wrote:

This may help you. https://academic.oup.com/nar/article/40/10/e72/2411059

ADD COMMENTlink written 11 months ago by Ranan Jyoti Sarma50
0
gravatar for chen
10 months ago by
chen1.9k
OpenGene
chen1.9k wrote:

You should take a look at the GC content curves.

The fastp tool mahy help, see: https://github.com/OpenGene/fastp

ADD COMMENTlink written 10 months ago by chen1.9k

Some sequencers, like Illumina NovaSeq may have polyG in end of reads, which may affect GC curve. Use fastp to trim polyG and check the post-filtering data.

ADD REPLYlink written 10 months ago by chen1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1736 users visited in the last hour