ABSOLUTE to estimate tumour purity from WES data
Entering edit mode
5.8 years ago

Dear all,

I have WES bam files from tumour samples and a normal sample of one patient, and I would like to estimate the tumour purity of the tumour samples with ABSOLUTE. They say that "you can supply a tab delimited segmentation file (e.g. from array CGH or massively parralel sequencing experiments) - this file must contain the columns "Chromosome", "Start", "End", "Num_Probes" and "Segment_Mean".".

I have read similar questions here and here, but I still haven't understood how to create the segmentation file.

I have already analysed copy number alterations (CNAs) using CopywriteR and it gives me a read_counts file with "Chromosomes", "Start", "End" and the read counts for the samples but I don't see where to get the "Num_probes" and the "Segment_Mean" information.

I would highly appreciate your help

ABSOLUTE Whole Exome Sequencing Tumour Purity • 7.6k views
Entering edit mode
5.8 years ago
Eric T. ★ 2.7k

If you've run all 3 phases of the CopywriteR pipeline (the functions "preCopywriteR", "CopywriteR", and "plotCNA") according to the tutorial/vignette/manual, then in the output folder, under the "CNAprofiles" subdirectory where you find "read_counts.txt" and all the other output files, there will be a file called "segment.Rdata" -- this is generated by plotCNA.

If you load "segment.Rdata" in R, you get a list with an "$output" field that contains the dataframe you need, essentially in SEG format. Here's my script for loading that file and converting it to SEG:

#!/usr/bin/env Rscript
# Usage: Rscript copywriter2seg.R <in_fname.Rdata> <out_fname.seg>

(args = commandArgs(TRUE))

in_fname = args[1]  # e.g. segment.Rdata
out_fname = args[2]  # e.g. sample.seg
cat("Loading", in_fname, "\n")
seg = segment.CNA.object$output
seg$loc.start = floor(seg$loc.start)  # force coordinates to be integers
seg$loc.end = floor(seg$loc.end)
write.table(seg, out_fname, quote=FALSE, sep="\t", row.names=FALSE, col.names=TRUE)
cat("Wrote", out_fname, "\n")
Entering edit mode

Thank you very much for your answer and the script, now I could get ABSOLUTE running however I still have a number of questions:

  1. My results of CopywriteR seem to be affected by the different tumour purity of the samples, i.e. samples with more normal tissue show less CNAs. Does that bias the results of ABSOLUTE since we are using segment.Rdata as an input?
  2. I used the base RunAbsolute command without point mutation information, however in the paper and in the manual they say that the somatic point mutations in MAF files may be used if available. I have the MAF files (I used MuTect) but I noticed as well that the variant allele frequencies (VAFs) of the mutations are influenced by the tumour purity of the sample, i.e. samples with more normal tissue have lower VAFs. So I wonder how could I use the MAF files if the VAFs are influenced by the unknown tumour purity. Actually, I was planning to use ABSOLUTE results to re-call mutations with MuTect explicitly saying the tumour purity, if that's possible...
  3. I used the parameters:

    sigma.p <- 0
    max.sigma.h <- 0.02
    min.ploidy <- 0.95
    max.ploidy <- 10
    max.as.seg.count <- 1500
    max.neg.genome <- 0
    max.non.clonal <- 0

    for the command RunAbsolute, which are used in the example in the manual page, so I was wondering if those are considered like default parameter values or they should be different for every sample or type of data?

  4. My last question is related to the answer that you wrote here some time ago. I was trying Theta2 as well and it worked well with the example data but when I tried my files i got an error. I posted a question in their Theta users group but it seems that the forum is very passive. Maybe I will post the question in this forum.

Thank you very much!

Entering edit mode

1) This is because the CNV signal is weaker, and it's expected to happen with any caller. However, ABSOLUTE and its competitors don't need a perfect segmentation to work effectively, as long as most of the larger CNVs are detected.

2,3) ABSOLUTE is not particularly easy to use, so many users just run it via GenePattern or GenomeSpace instead. Other more recent software including THetA2, PyClone, and BubbleTree appear to perform better now, so you might want to just use one of those instead -- they all take a CNV segmentation and SNP calls as input.

4) I'd give them a few more days to triage your bug report and post a reply. They do have a staff scientific programmer (last I checked) and are fairly dutiful in maintaining their software (for an academic lab). You could also try posting the issue on their GitHub page.

Entering edit mode
5.8 years ago
poisonAlien ★ 3.1k


You need to segment your data using DNACopy bioconductor package, which takes log2 ratio ( here its your tumor to normal depth ratio) along with locus ("Chromosomes", "Start", "End"). There are other packages too which generate segmented data. This segmented data is what absolute uses for purity estimation.

You can also use varscan copynumber.

Entering edit mode

Thanks for the reply. I am currently trying varscan copynumber and DNAcopy.

Entering edit mode

Along with Absolute, Sequenza also works fine (plus its easy to run).

A suggestion from my experience. I have tried many copynumber callers built for WXS, but a lot of them suffer from noise and most of the time data looks hypersegmented (Including from Varscan). But recently I found out that broad is working on a copynumber walker for their new GATK. I cloned their working repo and it works very well even with noisy data. Installing this is a bit of pain in posterior but maybe you can give it a try.


Login before adding your answer.

Traffic: 2897 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6