Question: SciClone not numeric error
1
gravatar for ioantika
2.7 years ago by
ioantika10
ioantika10 wrote:

I tried to use sciClone on test data prior to applying any script on my actual patient database, but the program persistently keeps returning the following error message;

[1] "checking input data..." [1] "No copy number files specified. Assuming all variants have a CN of 2." [1] "ERROR: column vaf in sample 5 is not numeric" Error in cleanAndAddCN(vafs[[i]], copyNumberCalls[[i]], i,cnCallsAreLog2, :

I've only tried entering SNP data as my database contains mainly SNVs stemming from targeted sequencing on specific gene panels. Below you can download the test files as well as view the actual script that I tried to run

library(sciClone)
v1 = read.table("folder/nrm.dat");
v2 = read.table("folder/tum1.dat");
v3 = read.table("folder/tum2.dat");
names = c("Normal","Tumor1","Tumor2")
sc = sciClone(vafs=list(v1,v2,v3), sampleNames=names[1:3])

Files; nrm.dat, tum1.dat and tum2.dat

https://www.sendspace.com/filegroup/ram8xDRCKi9mxE7vrYQeE2rjkmLSGffY
snp next-gen R genome • 1.2k views
ADD COMMENTlink modified 2.7 years ago by Chris Miller20k • written 2.7 years ago by ioantika10
0
gravatar for cbst
2.7 years ago by
cbst140
Oslo
cbst140 wrote:

You can try the following:

  1. Make sure your variant allele frequency variable is a value between 0 and 100.
  2. Convert the vaf variable into a numeric variable, and/or convert your file into a dataframe

For example for sample 1:

v1 <- data.frame(v1)

v1$vaf <- as.numeric(v1$vaf)

(but make sure that vaf is not a factor, otherwise you will get the levels of vaf, and not the actually value)

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by cbst140

I tried the as.numeric solution and it turns out that all values in V3, V4, and V5 are converted in irrelevant/random numbers. My VAFs are expressed as percentages (%) with 4 after comma values eg. 18,4587%

ADD REPLYlink written 2.7 years ago by ioantika10
0
gravatar for Chris Miller
2.7 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

Commas are not the same as a decimal point in R.

Your file inputs look like this:

1 4479383 186 28  13,0841
1 6575255  48  0  0,0000
1 7083445 111 20  15,2672
1 8476489 106  8  7,0175

When they need to look like this:

1 4479383 186 28  13.0841
1 6575255 48   0  0.0000
1 7083445 111 20  15.2672
1 8476489 106  8  7.0175

It appears that you can also tell R to treat commas as decimal points, doing something like the below:

a = read.table("tum2.dat",dec=",",sep="\t")
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Chris Miller20k

Although the test files work just fine my actual patient data give me "Error in kmeans(X, N.c, nstart = 1000) : more cluster centers than distinct data points." I've tried header=FALSE and still the same.

ADD REPLYlink written 2.7 years ago by ioantika10

If you set maxClusters to 10, and you only have, say, 7 points that make it through filtering, the algorithm will choke. Setting it to a lower number may be a short-term fix, but you're unlikely to get reasonable clustering results with only a handful of points anyway. Check your CN calls to make sure you haven't over-called, and/or reduce your minimumDepth (which may give you more points, at the expense of increasing the uncertainty of their true position)

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Chris Miller20k

I don't have copy numbers, just SNPs. I tried lowering down minimumDepth but I get the same results. My files contain 7-300 rows of SNP data for all sites per patient (primary, metastatic, normal) and I really want to get it working before the ESMO submission deadline.

ADD REPLYlink written 2.7 years ago by ioantika10

1) Normal files should not be used for clustering 2) What stdout is sciClone producing? It should say something about the number of copy-number neutral sites with adequate depth.

ADD REPLYlink written 2.7 years ago by Chris Miller20k

Hi Miller

I am trying add CNV (called from VarScan) as input. The format is as following:

1 861322 2453157 -0.0003

The 4-th column is segment_mean. is this format right as input? The reason I am ask is because the output figure is a little weird. It does not show any peak around 50% VAF although actually these are many in the vcf file

ADD REPLYlink written 7 months ago by CY270

You really should ask this as a top-level question, not buried in the comments of someone else's comment. Are your CN values log2? If so, then you need to set the appropriate "cnCallsAreLog2" parameter

ADD REPLYlink modified 7 months ago • written 7 months ago by Chris Miller20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour