Hello all
I have called somatic CNVs with both CNVkit (short-reads) and SAVANA (long-read) for paired tumor samples. This means I have the segment data for both short-read and long-read per sample
But I'm having trouble finding any good way of merging those segments files to select only sCNVs present in both short-read and long-read for the same sample
My idea was to follow this simple procedure: Merge short-read with long-read sCNVs -> Select sCNVs present in both techniques -> Use segment file in GISTIC2 to find enriched sCNVs in the whole cohort
The main problem is that some programs (for example SAVANA and GISTIC2) doesn't output VCF files. In fact, most CNV calling programs don't use VCFs or use and outdated VCF format (less than v4.2) which is not great for CNVs specifically
Another problem is at what distance I could merge the CNVs of the same type (total overlap, 1bp overlap, 1000bp distance)
I know the program SURVIVOR can merge VCFs for structural variants, but it does not accept other file formats.
Any ideas would be appreciated.
Example segment files:
IGV description of segment file
From CNVkit manual:
The SEG format is the tabular output of DNAcopy, the reference implementation of Circular Binary Segmentation (CBS). It is a tab-separated table with the following 5 or 6 columns:
ID – sample name
chrom – chromosome name or ID
loc.start – segment’s genomic start position, 1-indexed
loc.end – segment end position
num.mark – (optional) number of probes or bins covered by the segment
seg.mean – segment mean value, usually in log2 scale
The column names in the first line are not enforced, and can vary across implementations.
SEG files can be used with a number of other programs that operate on segmented log2 copy ratios – including GISTIC 2.0, IGV, the GenePattern server, and many R packages.
To convert CNVkit’s .cns files to SEG, use the command export seg, and to convert SEG files produced outside of CNVkit into CNVkit’s own segmented format (.cns), use import-seg.
Why do you need VCFs? Are you not operating at seg level?