Question: sciClone:input data wrong
0
gravatar for lyan.liu
2.8 years ago by
lyan.liu0
lyan.liu0 wrote:

Hi, I want to use sciclone on my exome sequencing data.I use varscan to call copy number:

samtools mpileup -B -q 30 -l $exon_bed -f $fa $normal $tumor >$name.tumor-nor.pileup
java -jar VarScan.v2.3.9.jar copynumber $name.tumor-nor.pileup $name --mpileup 1 java -jar VarScan.v2.3.9.jar copyCaller $name.copynumber --output-file filter.$name.copynumber

In order to obtain the segment_mean data,I use DNAcopy (http://varscan.sourceforge.net/copy-number-calling.html#copy-number-segmentation)

library(DNAcopy) cn <- read.table("filter.$name.copynumber",header=F) CNA.object <-CNA(genomdat = as.numeric(cn[,7]), chrom = as.numeric(cn[,1]), maploc = as.numeric(cn[,2]), data.type = 'logratio')
CNA.smoothed <- smooth.CNA(CNA.object) segs <- segment(CNA.smoothed, verbose=1, min.width=2) segs2 = segs$output write.table(segs2[,2:6], file="out.file", row.names=F, col.names=F, quote=F, sep="\t")

Above all without errors.

I got a out.file like these:

1 89 34735 1614 874.4554
1 34736 34750 4 3295.75
1 34751 37517 661 898.7126
1 37518 37536 8 2306.375
1 37537 38024 207 923.657
1 38027 38038 4 3223.75
1 38039 55963 2290 898.076

The segment_mean value is so large but I saw the value in example data is probably less than 3. I used the results as sciclone's input data,but the plot is not complete. Report these:

source("run.R") [1] "checking input data..."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
604 sites (of 13103 original sites) are copy number neutral and have adequate depth in all samples 120 sites (of 13103 original sites) were removed because of copy-number alterations
12497 sites (of 13103 original sites) were removed because of inadequate depth
12499 sites (of 13103 original sites) were removed because of copy-number alterations or inadequate depth
[1] "clustering..."
kmeans initialization:
V1
0.4092
0.356976470588235
0.256398076923077
0.279226666666667
0.234887804878049
0.311459375
0.480088888888889
0.204376436781609
0.218423943661972
0.7059
Using threshold: 0.7
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 2 with too few variants ( 0 ) center: 0.499965
Dropped cluster 3 with too few variants ( 0 ) center: 0.499965
Dropped cluster 3 with too few variants ( 0 ) center: 0.499965
Dropped cluster 4 with too few variants ( 1 ) center: 0.7047737
Condition ( 1 D): Removing 1 because of overlap ( 0.1035335 ) with i = 3
Condition ( 1 D): Removing 1 because of overlap ( 0.18061 ) with i = 2
Cluster 1 pi = 0.998 center = 0.241 SEM = (0.240, 0.243) sd = (0.204, 0.270)
Outliers:
[,1]
[1,] 0.3
Converged on the following parameters:
mu:
81510.7681795071
alpha:
867.472949068126
nu:
49909.3536335395
beta:
168.832704565698
pi:
0.998344374600373
[1] "finished clustering full-dimensional data..."
[1] "found 1 clusters using bmm in full dimensional data"

The copy number of these variants all was assumed to be 2,even if I divide all data in fifth column by 1000,The results did not change.

ADD COMMENTlink modified 2.8 years ago by Chris Miller21k • written 2.8 years ago by lyan.liu0

Hi, lyan:

Did you fix your problem? I had the similar problem with you. After running the DNA copy, my segment_mean value is very big. I have around 4000 somatic mutations. Thanks.

HY

ADD REPLYlink written 2.7 years ago by lhaiyan330

I was following this :http://wp.zxzyl.com/?p=156
hope this helps. Lyan

ADD REPLYlink written 2.7 years ago by lyan.liu0
0
gravatar for Chris Miller
2.8 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

1) First of all, yes, something is clearly not right in your copy number results. Make sure that you're grabbing the correct input columns when you're feeding it to DNAcopy

2) SciClone tells you exactly what it's doing:

"Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."

For some reason, your entire genome isn't tiled by the DNAcopy segments (which is unusual) Again, fix your cn calls and you'll be in better shape

3) Finally, 13000 somatic variant calls is a lot for an exome. Is this a hypermutated sample or something like a heavily UV damaged melanoma? If not, then you may want to examine your variant inputs.

ADD COMMENTlink written 2.8 years ago by Chris Miller21k

Thanks very much for your reply!
1) I used varscan copynumber(the commands in the top page) to generate my copynumber result ,like these :
chrom chr_start chr_stop num_positions normal_depth tumor_depth log2_ratio gc_content
1 65873 65972 100 66.3 91.5 0.464 45.0
1 69482 69513 32 150.8 15.8 -3.259 53.1
1 69514 69600 87 229.4 41.7 -2.461 50.6
the log2_ratio is the column 7,so I use DNAcopy like this : library(DNAcopy)
cn <- read.table("$name.copynumber",header=F)
CNA.object <-CNA(genomdat = as.numeric(cn[,7]), chrom = as.numeric(cn[,1]), maploc = as.numeric(cn[,2]), data.type = 'logratio')
CNA.smoothed <- smooth.CNA(CNA.object)
segs <- segment(CNA.smoothed, verbose=1, min.width=2)
segs2 = segs$output
write.table(segs2[,2:6], file="out.file_test", row.names=F, col.names=F, quote=F, sep="\t")

I was so confused .

3) I test another sample which had only 4000 somatic variants,the report also like these:

source("run.R")
[1] "checking input data..." [1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
66 sites (of 4127 original sites) are copy number neutral and have adequate depth in all samples
65 sites (of 4127 original sites) were removed because of copy-number alterations
4061 sites (of 4127 original sites) were removed because of inadequate depth
4061 sites (of 4127 original sites) were removed because of copy-number alterations or inadequate depth
[1] "clustering..."
......

the script run.R like: library(sciClone)
v0 = read.table("Sample_1.vafs", sep="\t")
cn0 = read.table("out.file_test")
cn0 = cn0[,c(1,2,3,5)]
clusterParams="empty"
sc = sciClone(vafs=list(v0), sampleNames=c("MMY4"), copyNumberCalls=list(cn0), minimumDepth=100, doClustering=TRUE, clusterParams=clusterParams,maximumClusters=10, copyNumberMargins=0.25, useSexChrs=FALSE)
writeClusterTable(sc, "clusters")
writeClusterSummaryTable(sc, "cluster.summary")
save.image("out.Rdata")
source("plot.R")

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by lyan.liu0

Other than CN strangeness, the biggest problem appears to be that the minimumDepth parameter is set too high for the data you have. If you have 30x sequencing, then using a minimum depth of 100x does not make any sense. You should also be aware of the tradeoffs between depth and ability to distinguish clusters - see Fig 5 here: http://www.cell.com/cell-systems/abstract/S2405-4712(15)00113-1

ADD REPLYlink written 2.8 years ago by Chris Miller21k

I tried to use minimumDepth=10,only to be filtered less sites, the other did not changed.The problem should be CN data abnormal.Are there any other ways to call copy number or generate segment_mean data ?

ADD REPLYlink written 2.8 years ago by lyan.liu0

Sure, there are many programs that are able to call copy number from paired tumor normal bams. cn.mops is one that comes to mind - there are many others that a quick search on this or other sites will turn up.

ADD REPLYlink written 2.8 years ago by Chris Miller21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1208 users visited in the last hour