Well I hope I can explain myself well,
The vcf files I got have in the INFO column all data together (AF, MLLD,SVTYPE, GENE...) split by "; "
when I put on excel and try to split in columns, the columns dont follow each other like it should be, because all data isnt equal.
Is any easy way to do this? so I dont need to fix it manually?
I hadnt done this file but Im wondering mostly all VCF are like that??? instead have on the head the requiered info? (precise, svtype, func....)
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 68928 . T <CNV> 100.0 PASS PRECISE=FALSE;SVTYPE=CNV;END=69412;LEN=484;NUMTILES=4;CONFIDENCE=0;PRECISION=2.04702;FUNC=[{'gene':'OR4F5'}] GT:GQ:CN ./.:0:2
chr1 871334 . G T 431.84 PASS AF=0.488263;AO=104;DP=213;FAO=104;FDP=213;FR=.;FRO=109;FSAF=68;FSAR=36;FSRF=67;FSRR=42;FWDB=0.00178582;FXX=0;HRUN=2;LEN=1;MLLD=52.4929;QD=8.10962;RBI=0.00909038;REFB=-0.0299934;REVB=-0.00891325;RO=109;SAF=68;SAR=36;SRF=67;SRR=42;SSEN=0;SSEP=0;SSSB=0.0396347;STB=0.521838;STBP=0.536;TYPE=snp;VARB=0.0325404;OID=.;OPOS=871334;OREF=G;OALT=T;OMAPALT=T;FUNC=[{'transcript':'NM_152486.2','gene':'SAMD11','location':'intronic'}] GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:99:213:213:109:109:104:104:0.488263:36:68:67:42:36:68:67:42
Thanks
While the split can be done using
sed
etc you would have the same problem (of unequal # of columns) if you try to import the data into excel :)Thanks genomax....
well..the fact is I dont want to have on excel, I thought use excel and get back my vcf file with the modify head and all INFO splited by columns