Entering edit mode
                    3.9 years ago
        curiousbiologist
        
    
        ▴
    
    40
    Hello,
I would like to extract info from my VCF file. No problem with info or annotation fields:
java -jar SnpSift.jar extractFields -s "," -e "EMPTY" test.vcf "CHROM" "POS" "REF" "ALT" "DP" "ANN[*].HGVS_P" > test-fields.xls
How ever I haven't found how to extract VAF value of the last field "GT:DP:AD:RO:QR:AO:QA:GL:VAF"
I didn't find name for this field, how can I do that?
Thank you for your advices
Here is a line of my VCF file:
(I want to extract "1" from VAF)
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  unknown                                                                                                                                                                                                             
NC_045512.2 210 .   G   T   16381.3 PASS    AB=0    ABP=0   AC=2    AF=1    AN=2    AO=487  CIGAR=1X    DP=487  DPB=487 DPRA=0  EPP=4.29892 EPPR=0  GTI=0   LEN=1   MEANALT=1   MQM=60  MQMR=0  NS=1    NUMALT=1    ODDS=679.731    PAIRED=0    PAIREDR=0   PAO=0   PQA=0   PQR=0   PRO=0   QA=18260    QR=0    RO=0    RPL=242 RPP=3.05043 RPPR=0  RPR=245 RUN=1   SAF=250 SAP=3.76385 SAR=237 SRF=0   SRP=0   SRR=0   TYPE=snp    ANN=T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01|protein_coding||c.-56G>T|||||56|   T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725297.1|protein_coding||c.-56G>T|||||56|WARNING_TRANSCRIPT_NO_STOP_CODON   T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742608.1|protein_coding||c.-56G>T|||||56|WARNING_TRANSCRIPT_NO_STOP_CODON   T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01.2|protein_coding||c.-56G>T|||||56| T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725298.1|protein_coding||c.-596G>T|||||596|WARNING_TRANSCRIPT_NO_START_CODON    T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742609.1|protein_coding||c.-596G>T|||||596|WARNING_TRANSCRIPT_NO_START_CODON    T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725299.1|protein_coding||c.-2510G>T|||||2510|WARNING_TRANSCRIPT_NO_START_CODON  T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742610.1|protein_coding||c.-2510G>T|||||2510|WARNING_TRANSCRIPT_NO_START_CODON  T|intergenic_region|MODIFIER|CHR_START-ORF1ab|CHR_START-GU280_gp01|intergenic_region|CHR_START-GU280_gp01|||n.210G>T||||||  GT:DP:AD:RO:QR:AO:QA:GL:VAF 1/1:487:0   487:0:0:487:18260:-1642.66  -146.602    0:1
                    
                
                
Whilst this may work, it's generally not recommended to use
vcftoolsas it is hasn't been under active development for a long time and may contain bugs. Better to usebcftools.that's very nice. I guess I can now put together "test-fields.xls" and output from vcftools to have a nice final file but I don't know how to do that: maybe with sed?
You can use
awkfor that task