Hi there,
I am using ShatterSeek to infer potential chromothripsis. It seems that the original article of ShatterSeek used output of DELLY as input for structual variation. However, when I read the vignette of ShatterSeek, the demo data seems to have two chromosome breakpoint (below is the input format of structural variation for ShatterSeek), while the output of DELLY only provided one.
chrom1
(character): chromosome for the first breakpoint
pos1
(character): position for the first breakpoint
chrom2
(character): chromosome for the second breakpoint
pos2
(character): position for the second breakpoint
SVtype
(character): type of SV, encoded as: DEL (deletion-like; +/-), DUP (duplication-like; -/+), h2hINV (head-to-head inversion; +/+), and t2tINV (tail-to-tail inversion; -/-).
strand1
(e.g. + for DEL)
strand2
(e.g. - for DEL)
I am not sure if the output of DELLY still need further annotation to get such format for ShatterSeek? Even if I have a TCHR and START column in DELLY output, but most of DUP and DEL types of SV do not have any values in these column. I am totally lost now, anyone any suggestions?
Many thanks in advance.
This is wrong, see the INFO/CHR2 annotation. eg. https://github.com/VCCRI/SVPV/blob/master/example/delly.vcf#L509
chr12 71315481 INV00010872 A <INV> . PASS PRECISE;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.7.3;CHR2=chr12;END=71316542;INSLEN=0;PE=68;MAPQ=60;CT=5to5;CIPOS=-42,42;CIEND=-42,42;SR=33;SRQ=1;CONSENSUS=GAGGAGGCCAGAGGTTGGGTAAACAGGGCCTGGCTGAGGTGTGTTGGCTCTACTGAGTGGATTTCTGCCTGCCACCTCATTGCTCTATTTGCAGCCTCATCCCAACCCCAGGCAGCAGTTAAAGAGAGAACAGGAGTAAAAATTAACAGG;CE=1.99026;RDRATIO=1.2289;AC=2;AN=6 (...)
Thank you for pointing this out. But I have a different file format:
You can see I can just find a TCHR and TSTART of which values are both na. Additionally there is no TEND.