Entering edit mode
                    19 months ago
        a.beggs
        
    
        ▴
    
    60
    Hi all
I have a VCF file with the following lines:
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chr17   23197000        Spectre.DEL.7ROFFYQK    N       LOSS    .       .       END=25683000;SVLEN=2486000;SVTYPE=LOSS;CN=0     GT:HO:GQ        1/1:0.0:60
chr18   19357000        Spectre.DEL.8B1N5YFJ    N       LOSS    .       .       END=20560000;SVLEN=1203000;SVTYPE=LOSS;CN=0     GT:HO:GQ        1/1:0.0:60
chr1_KI270709v1_random  2000    Spectre.DUP.Y9R4QQKP    N       GAIN    .       .       END=18000;SVLEN=16000;SVTYPE=GAIN;CN=42 GT:HO:GQ        ./.:0.0:60
chr2_KI270715v1_random  143000  Spectre.DUP.7IRZ6XDF    N       GAIN    .       .       END=160000;SVLEN=17000;SVTYPE=GAIN;CN=5 GT:HO:GQ        ./.:0.0:60
chr9_KI270719v1_random  137000  Spectre.DUP.YC1FK3L0    N       GAIN    .       .       END=173000;SVLEN=36000;SVTYPE=GAIN;CN=4 GT:HO:GQ        ./.:0.0:60
chr11_KI270721v1_random 5000    Spectre.DUP.YB0LB1EU    N       GAIN    .       .       END=18000;SVLEN=13000;SVTYPE=GAIN;CN=4  GT:HO:GQ        ./.:0.0:60
For various reasons the tertiary analysis pipeline I am feeding the VCF into is extremely fussy about its input. It wants:
- SVTYPE has to be CNV
 - ALT allele needs to be <CNV>
 - the ID field is used to determine if it is LOSS or GAIN so needs to include this text
 - FORMAT/CN field is required for copy number
 
I have tried pyVCF, BCFtools and awk to convert it to look like this but can't seem to make it work... has anyone the VCF wizadary to give me any pointers please? The main issue is getting the CN from INFO to FORMAT, and adding the ID field to have LOSS/GAIN
Do you mean like the following example diff:
You also need to modify the header.
What kind of pipeline is this?
Yeah that's what I'm looking for, awk can take care of the header for me... are you saying diff can do this?!
That should be easy to script in perl. diff was just my way of defining the changes to be made to your file.
so this is your real question; Show us the code.