Control-FREEC breakPointThreshold value
2
1
Entering edit mode
7.9 years ago
2nelly ▴ 310

Hi all,

Does anybody have any recommendation regarding breakPointThreshold argument in control-FREEC?

I know that values should be between 0.1-0.8 and the closest to 0.1 the more CNV's will be detected. But how can I specify what is the appropriate value to use in my analysis?Because I tried 0.8 and 0.6 and I noticed a big difference in my results.

I am running a WES analysis using normal tumor comparison.

Thank you in advance.

sequencing next-gen • 4.5k views
ADD COMMENT
0
Entering edit mode

Dear Chris,

So, as I told you I have some sequenced samples with higher coverage than others. These samples were double sequenced. So, when I split it in two different files. So instead of having a file with 400M reads, I have two of 200M reads. So, I run the same analysis for all these three files (400M, 200M, 200M) with the same arguments and I noticed that at the end the results are totally different for the 400M file. I am getting too many CNVs which all of them are gains. That means the algorithm didn t perform normalization. That means the results is totally biased by the coverage. How therefore did you perform an unbiased analysis from samples with different coverage? Maybe I missed an argument.Thank you in advance. Here is my new configuration file

[general]

BedGraphOutput = TRUE
bedtools= ~/bin/bedtools2/bin/bedtools
breakPointThreshold = 0.4
breakPointType = 4
chrLenFile = ~/Desktop/control-freec/chr.len
degree = 1
intercept = 0
minCNAlength = 4
maxThreads = 2
noisyData = TRUE
outputDir = ~/Desktop/control-freec/output_sample
ploidy = 2
printNA = FALSE
readCountThreshold >= 50
samtools = ~/bin/samtools
sex = XY
step = 250
window = 500

[sample]

mateFile = ~/Desktop/processed_reads/tumor_sample.bam
inputFormat = BAM
mateOrientation = FR

[control]

mateFile = ~/Desktop/processed_reads/normal_sample.bam
inputFormat = BAM
mateOrientation = FR

[target]

captureRegions = ~/Desktop/control-freec/mm10_newtargets.bed

Actually, I realized that if you use window and step argument in exome sequence analysis, then capture regions file is totaly useless. For instance, the read depth is calculated every 500 bases in my case.

ADD REPLY
0
Entering edit mode

Yes you are correct, probably this tool is not very much suited for your sample. So I would suggest you to check VarScan2, AdTex, ExomeCNV and such. Yes the RC normalization is on the binning of the 500 bases window. In that case you can remove the step and window size. I have never worked with such high depth data. Is it 40X or 400X? If it is 400X then probably you have to make a coverage plot to understand the median level of coverage for your sample either for each bases or for the baits and based on that you can select the readCountThreshold. But definitely do not stick to one tool. I was not able to replicate my results with CBS method so I went with Control-FREEC and ADtex, but since you have a very high depth in order of 400X I would actually ask to use some other tools which has great performance with higher depth, VarScan2 should be better I believe here.

ADD REPLY
0
Entering edit mode

Actually, my coverage is something like x100 and x200. I tried to use ADTEx as well, but something went really bad. When I am trying to obtain the Depth of Coverage in ADTEx respective step, the temporary files created for coverage of normal and tumor are endless;what i mean is that ADTEx keeps writing in those files like for ever and the size was more than 500GB each and of course my server crashed due to shortage of free space. Any idea what is happening? Same thing when I tried to get coverage with bedtools

ADD REPLY
0
Entering edit mode

A late comment but I have been using Control FREEC recently and encountered the situation where my normal had 3x coverage over the tumour. I was worried that the program would not adjust for library size and just call deletions everywhere; however, based on how it has been developed (i.e. modelling coverage), it appeared to tolerate the differences in overall library size and calls appeared as expected.

My data was whole genome and the configurations:

[general]
BedGraphOutput=TRUE
bedtools=bedtools2/bin/bedtools
breakPointThreshold=0.8
breakPointType=2
chrFiles=refs/hg19/chromosomes/
chrLenFile=refs/hg19/hg19.len
coefficientOfVariation=0.06
contamination=0
contaminationAdjustment=FALSE
degree=3&4
forceGCcontentNormalization=0
gemMappabilityFile=refs/hg19/out100m1_hg19.gem
intercept=0
minCNAlength=1
minMappabilityPerWindow0.85
minExpectedGC=0.35
maxExpectedGC=0.55
minimalSubclonePresence=100
maxThreads=6
numberOfProcesses=6
noisyData=FALSE
outputDir=results/test
ploidy=2,3,4
printNA=FALSE
samtools=samtools-1.8/samtools
sex=XX
step=10000
telocentromeric=50000
uniqueMatch=TRUE
window=50000

[sample]
inputFormat=BAM
mateFile=C1.bam
mateOrientation=0

[control]
inputFormat=BAM
mateFile=WBC.bam
mateOrientation=0
ADD REPLY
2
Entering edit mode
7.9 years ago
ivivek_ngs ★ 5.2k

It is basically done for segmentation , so ideally when you lower the breakPointThreshold parameters you will get more segments (which ideally means more CNVs) , you can play with the different parameters and try to plot the distribution or do a histgram of the CNVs with both the parameters. More CNVs does not often translate to great biological reasoning. You should have to look to what these regions of CNVs convey, whether there is a tumor suppressor or an oncogene region that is affected or not and then how is that affecting the phenotype. To what extent is the numbers changing? If it is too high then you can go with a lower threshold but everything relies once you annotate the regions. Your plot for CNVs are usually segmentation profiles(normalized) , how do both of the plots look like? Do they drastically change if not, then it will really not matter that much. What will matter is the annotated regions that come from those CNVs and to what extent they translate to a reasoning that catapults the fact that CNVs are driving the phenotype.

ADD COMMENT
0
Entering edit mode

Thanks for your answer Chris.

I think you mixed up all the things in my mind. Which means that this could be bad and good at the same time.

As far as I understood you suggest that some extra CNVs that i am getting by lowering the breakPointThreshold might not be so significant regarding their impact. But I am wondering if they are real or not. For instance if you are interesting in getting a general profile of CNVs in your tumor samples and not care about their impact, would be more wise to use the default threshold? Additionally, lowering too much the threshold, don t you think you increase the chances of false positive calls?

Imagine that by lowering the threshold from 0.8 to 0.6 i am getting 2x more CNVs

I can realize that thing are so complicated.It s not like black and white

Thanks again

ADD REPLY
1
Entering edit mode

I would like to mention : Did you run the significance script on you regions which are provided by the Control-FREEC tool. There are some ad-hoc scripts, try to run them on both the results and see how many significant regions you have post the operation

Try to see the plot for both the result, if it is not that different then go with the one which is having more CNV regions, Try to annotate both the outputs and see what kind of biological inference these regions get annotated to and are there contributors or even drivers in your tumor phenotype. All this gives much more reasoning for you to select the threshold best fit. In any case the more CNV here is usually more resolution and deeper information and it entirely depends upon you to what level you want to utilize those information. Another way to validate is intersecting both the CNV region bed file that comes as output which will give you an understanding of the resolution. Both the threshold is fine as far as they give your biological question with much cleaner answer.

ADD REPLY
0
Entering edit mode

Thanks again! I ll try those things you suggested me. I would also like to ask your opinion regarding a depth of coverage issue. I have some other samples from same origin that are sequenced in deeper coverage. That means i have higher number of reads covering my target baits for these samples. How can I normalized these in order to use them in my analysis? I think GC normalization will not work in that case

ADD REPLY
0
Entering edit mode

In case of exome and whole genome, deeper the coverage better you fish out regions having variants, deletion or duplication. If you are intending to do the same CNV analysis, you can use control-FREEC for the same, it will not be a problem with deeper coverage. Do you have matched normal or normal samples or are you just running the tool on tumor aligned or pileups? If you have your normal and tumor then you can do away with the GC content normalization. The normalization can be either with GCcontent or with read counts where the tumor will be normalized with that of the normal. For inferring of BAF scores and allelic status GC content normalization is used or if you do not want have a normal sample else the normal sample read counts will be used to normalize the tumor reads.

ADD REPLY
0
Entering edit mode

Sorry Chris maybe I missed something but reading the documentation of FREEC I couldn' t find readcount normalization option, only GC. In fact I have a matched normal sample and when I tried to run an analysis with one of my samples that have 2x deeper coverage, without BAF, I got only gains which according to higher coverage it is normal.So, the analysis it s not correct, it s biased to DoC; maybe I missed some option here. Please if you have some time please check my configuration file below, I would really appreciate that.

[general]

BedGraphOutput = TRUE
bedtools= ~/bin/bedtools2/bin/bedtools
breakPointThreshold = 0.6
breakPointType = 4
chrLenFile = ~/Desktop/control-freec/chr.len
degree = 1
intercept = 0
minCNAlength = 3
maxThreads = 1
noisyData = TRUE
outputDir = ~/Desktop/control-freec/output_sample
ploidy = 2
printNA = FALSE
readCountThreshold = 50
samtools = ~/bin/samtools
sex = XY
window = 0

[sample]

mateFile = ~/Desktop/processed_reads/tumor_sample.bam
inputFormat = BAM
mateOrientation = FR

[control]

mateFile = ~/Desktop/processed_reads/normal_sample.bam
inputFormat = BAM
mateOrientation = FR

[target]

captureRegions = ~/Desktop/control-freec/mm10_newtargets.bed
ADD REPLY
0
Entering edit mode

Take a look at this thread. It is mentioned clearly about the normalization method. I do not see the step-size in your config file. Try to run with step=250 and window=500 and see the output. I have run with varied depth where my normal and tumor never had similar depth, that is not the issue. Then once you have the output you can sun the significance script and then also generate the plots and try to see the plot and the significant regions.

ADD REPLY
0
Entering edit mode

Dear Chris,

According to FREEC manual -for whole exome sequencing: "window=0" -step (used only when "window" is specified); do not use for exome sequencing (instead set "window=0")

So, are you sure about those two arguments?

ADD REPLY
0
Entering edit mode

Yes , it actually depends on your data , it is said to have been recommended , but not wrong if you do, however , what I will suggest first is to first compare the breakpointThreshold as you were saying , run significance script, run plots, generate profiles and compare and see how many you are getting regions in CNV, if you normal files it usually take read count normalization rather than GC , unless you force GCcontentnormalization.

ADD REPLY
0
Entering edit mode

Ok, thanks a lot Chris!

One last question, I promise. From your experience is better to force GC content normalization in WES or not?

ADD REPLY
0
Entering edit mode

Not necessarily. I however cannot answer it to a level of sanity check because I did not deep delve with this parameter much, I was more concerned with the other parameters. You can actually run the various parameters and get various output and try to do this check and see how the result varies. For CNV it was not required if you have normal /tumor , for LOH you need BAF and allelic status , there you might include it but not mandatory.

ADD REPLY

Login before adding your answer.

Traffic: 2218 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6