Question: copy number became inaccurate when downsampling different number fastq reads from a whole fastq file
0
gravatar for lffu_0032
9 days ago by
lffu_003240
lffu_003240 wrote:

Hi, everyone:

Recent i am doing CNV analysis, i find a strange thing:

1.when using whole data (after removed dup, for tumor and its control), the copy number value is accurate;

2.but when downsample different fq reads from the whole fq for a certain sample (I just downsample the tumor's fq, and do not downsample its control's fq), the copy number is different from the whole fq data result, and the smaller downSample fq reads, the more difference between whole data cnv result and downsample cnv results.

Anyone ever encountered this problem? I am now very confused by CNV downsample analysis. thanks for your help.

ADD COMMENTlink modified 7 days ago • written 9 days ago by lffu_003240
1

Why are you downsampling? Decreasing read depth is going to increase noise and make things less accurate. What did you expect to see?

ADD REPLYlink written 9 days ago by jared.andrews075.0k

we are doing plasma cfDNA sequencing, the sequencing depth is very high (about 10000X-30000X), we want to determine which sequencing depth is enough to call CNV accurately, so we had sequenced some cfDNA samples with ~30000X, and then downsampling to 10000X、15000X、20000X to see which depth is ok to call cnv.

ADD REPLYlink written 9 days ago by lffu_003240

Which copy number program are you using? There are probably > 100 programs, the majority of which do not address all types of scenarios / biases. Edit: I see that German had already asked this. Can you please answer?

ADD REPLYlink modified 8 days ago • written 8 days ago by Kevin Blighe54k

I do not use any thrid-party tools. i just use very simple method: use mean depth normalization to adjust the library size effect, ant calculate the normalized value (depth/mean_depth) for tumor and normal samples, sepeartely. for normal sample, i do not downsampling reads, so the normalized value for normal is fixed. the problem is that: the normalized value changes dramatically when downsampling different reads number, especially when the downsampling reads number is very small (relative to its full fq data)

ADD REPLYlink modified 8 days ago • written 8 days ago by lffu_003240

That explains your finding, in that case.

ADD REPLYlink written 8 days ago by Kevin Blighe54k

en, i am not very understand, can you give some detail explan? thank you a lot.

ADD REPLYlink written 8 days ago by lffu_003240

Sorry, just to help understand better, can you perhaps show some example calculations?

ADD REPLYlink written 8 days ago by Kevin Blighe54k

Which tool do you use?

ADD REPLYlink written 9 days ago by German.M.Demidov1.5k

i just use mean depth normalization method to adjust sequencing library size, normalized_depth = target_depth/mean(target_depth), i find for some genes, with smaller downsampling reads, the normalized_depth becames smaller, and for some genes, with smaller downsampling reads, the normalized_depth becames bigger.

ADD REPLYlink written 8 days ago by lffu_003240

I am not sure I understand, but may be the problem you describe is called "random sampling"

ADD REPLYlink written 8 days ago by German.M.Demidov1.5k

yes, random sampling different number of reads from a full fq data to see which depth is enough to call cnv.

ADD REPLYlink modified 8 days ago • written 8 days ago by lffu_003240

Which program are you using for the random sampling?

ADD REPLYlink written 8 days ago by Kevin Blighe54k

I use seqtk sample command

ADD REPLYlink written 7 days ago by lffu_003240

I want to known if anyone ever done the downsampling CNV analysis and encountered this problem?

ADD REPLYlink written 7 days ago by lffu_003240
2

I did this and I found the same and I was not surprised or confused - that's how statistics works and most of the CNV-detection tools in NGS are based on statistical methods

ADD REPLYlink written 7 days ago by German.M.Demidov1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1548 users visited in the last hour