Regarding GC correction in sequencing data
1
0
Entering edit mode
7.0 years ago
DL ▴ 50

Hello, I have gone through some papers but i can not get that why GC content estimation and correction is important for CNV analysis ?? Can anyone explain me ??

Thanks & Regards

next-gen genome R sequencing • 2.1k views
ADD COMMENT
0
Entering edit mode
7.0 years ago
Rob 6.9k

Imagine that, due to GC bias, you are twice as likely to sample sequencing reads from a mid-GC region of the genome as a high-GC region of the genome. That is, all other things being equal (e.g. copy number), you will derive twice as many sequencing reads from mid-GC regions as from high-GC regions. This might occur due to, e.g., differential and sequence-dependent amplification during PCR, but there are a number of places where extreme GC content might lead to a lower _a priori_ probability of generating a read. Then, if not accounted for, this effect will confound your analysis of copy number. This is because the sequencing data itself represents a combination of both the true "biological signal"---the copy numbers---and the technical biases (in this case, the GC content of the sequence). While it is not generally possible to perfectly remove all technical effects, correcting for effects like GC content that are known to be significant in some cases, can help tease apart the true biological signal from the technical artifacts.

ADD COMMENT

Login before adding your answer.

Traffic: 2779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6