I have been using the following commands to annotate my vcf file with Refseq annotation on SnpEff 4.3t, followed by snpsift to extract columns of interest.
java -Xmx4g -jar snpEff.jar hg19 $panel.sorted.vcf > $panel.snpeff.hg19.vcf java -jar SnpSift.jar extractFields -s "," -e "." $panel.snpeff.hg19.vcf CHROM POS ID REF ALT "ANN[*].FEATUREID" "ANN[*].HGVS_C" "ANN[*].HGVS_P" > $panel.snpeff.hg19.HGVS.txt
I can see that some feature IDs have been annotated with non-refseq annotations(eg 4PED, 4IDN etc) and the extraction of HGVS_P is not formatted.
Is there a better way to extract and format RefSeq and HGVS annotations ?
CHROM POS ID REF ALT ANN[*].FEATUREID ANN[*].HGVS_C ANN[*].HGVS_P** 1 227172290 rs12593 C T 4PED:A_480-A_522:NM_020247.4,4PED:A_480-A_523:NM_020247.4,4PED:A_480-A_636:NM_020247.4,4PED:A_480-A_640:NM_020247.4,4PED:A_480-A_643:NM_020247.4,NM_020247.4 c.1440C>T,c.1440C>T,c.1440C>T,c.1440C>T,c.1440C>T,c.1440C>T .,.,.,.,.,p.Phe480Phe 1 227174210 rs3738725 T C 4PED:A_572-A_630:NM_020247.4,NM_020247.4,NM_003607.3,NM_014826.4 c.1716T>C,c.1716T>C,c.*7759A>G,c.*7759A>G .,p.Ser572Ser,.,. 14 51057727 rs1060197 G A 4IDN:B_77-B_117:NM_015915.4,4IDN:B_77-B_117:NM_001127713.1,4IDN:B_77-B_117:NM_181598.3,NM_015915.4,NM_001127713.1,NM_181598.3 c.351G>A,c.351G>A,c.351G>A,c.351G>A,c.351G>A,c.351G>A .,.,.,p.Glu117Glu,p.Glu117Glu,p.Glu117Glu 1 112329551 rs3738298 G T NM_004980.4,NM_172198.2 c.1269+15C>A,c.1269+15C>A .,. 1 156785617 rs1800601 G A NM_001007792.1,NM_001161441.1,NM_001161443.1,NM_001161442.1,NM_003975.3,NM_001161444.1 c.-5G>A,c.123+181C>T,c.39+181C>T,c.69+181C>T,c.123+181C>T,c.123+181C>T .,.,.,.,.,. X 70442845 rs6525485 G A NM_000166.5,NR_001568.1,NM_001097642.2 c.-713G>A,n.173-12792C>T,c.-16-697G>A .,.,.