Hello! I am new using vep annotation and I have seen a lot of questions in how to include AD:DP (info fields) in --tab format. But I did not find the answer :(
I have used vep to annotated my VCF files using --tab and --vcf arguments separately. So I have 2 files (one tab delimited without the INFO fields and the other one is the VCF with the INFO fields).
I was wondering how can I add these fields from the VCF file to the one that is tab delimited.
The VCF file with the FORMAT field GT:GQ:AD:DP:VF:NL:SB:NC
that I need:
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT L30-1_S1.bam
chr2 208245343 . T C 0 q30 DP=894;phyloP=0.159;CSQ=C|missense_variant|MODERATE|IDH1|ENSG00000138413|Transcript|ENST00000345146|protein_coding|5/10||ENST00000345146.7:c.496A>G|ENSP00000260985.2:p.Thr166Ala|719|496|166|T/A|Aca/Gca|||-1||SNV|HGNC|HGNC:5382|YES|NM_005896.4||1|P1|CCDS2381.1|ENSP00000260985|O75874.215|A0A024R3Y6.7|UPI000012D1B4||1|tolerated_low_confidence(0.13)|benign(0.001)|PDB-ENSP_mappings:1t09.A&PDB-ENSP_mappings:1t09.B&PDB-ENSP_mappings:1t0l.A&PDB-ENSP_mappings:1t0l.B&PDB-ENSP_mappings:6vg0.C&PANTHER:PTHR11822&PANTHER:PTHR11822:SF28&TIGRFAM:TIGR00127&Gene3D:3.40.718.10&PIRSF:PIRSF000108&Pfam:PF00180&SMART:SM01329&Superfamily:SSF53659||||||||||||||||||||||||||||||||MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGRHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL GT:GQ:AD:DP:VF:NL:SB:NC 0/1:0:891,3:894:0.003356:18:-11.0619:0.0366
chr2 208245401 . A G 0 q30 DP=683;phyloP=0.159;CSQ=G|synonymous_variant|LOW|IDH1|ENSG00000138413|Transcript|ENST00000345146|protein_coding|5/10||ENST00000345146.7:c.438T>C|ENSP00000260985.2:p.Val146%3D|661|438|146|V|gtT/gtC|||-1||SNV|HGNC|HGNC:5382|YES|NM_005896.4||1|P1|CCDS2381.1|ENSP00000260985|O75874.215|A0A024R3Y6.7|UPI000012D1B4||1|||PDB-ENSP_mappings:1t09.A&PDB-ENSP_mappings:1t09.B&PDB-ENSP_mappings:1t0l.A&PDB-ENSP_mappings:1t0l.B&PDB-ENSP_mappings:1t0l.C&PDB-ENSP_mappings:1t0l.D&PDB-ENSP_mappings:6vei.A&PDB-ENSP_mappings:6vei.B&PDB-ENSP_mappings:6vg0.A&PDB-ENSP_mappings:6vg0.B&PDB-ENSP_mappings:6vg0.C&PANTHER:PTHR11822&PANTHER:PTHR11822:SF28&TIGRFAM:TIGR00127&Gene3D:3.40.718.10&PIRSF:PIRSF000108&Pfam:PF00180&SMART:SM01329&Superfamily:SSF53659||||||||||||||||||||||||||||||||MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGRHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL GT:GQ:AD:DP:VF:NL:SB:NC 0/1:0:681,2:683:0.002928:18:-12.0263:0.0243
chr2 208245402 . A G 0 q30 DP=686;phyloP=0.178;CSQ=G|missense_variant|MODERATE|IDH1|ENSG00000138413|Transcript|ENST00000345146|protein_coding|5/10||ENST00000345146.7:c.437T>C|ENSP00000260985.2:p.Val146Ala|660|437|146|V/A|gTt/gCt|||-1||SNV|HGNC|HGNC:5382|YES|NM_005896.4||1|P1|CCDS2381.1|ENSP00000260985|O75874.215|A0A024R3Y6.7|UPI000012D1B4||1|tolerated_low_confidence(0.16)|possibly_damaging(0.823)|PDB-ENSP_mappings:1t09.A&PDB-ENSP_mappings:1t09.B&PDB-ENSP_mappings:1t0l.A&PDB-ENSP_mappings:1t0l.B&PDB-ENSP_mappings:6vg0.A&PDB-ENSP_mappings:6vg0.B&PDB-ENSP_mappings:6vg0.C&PANTHER:PTHR11822&PANTHER:PTHR11822:SF28&TIGRFAM:TIGR00127&Gene3D:3.40.718.10&PIRSF:PIRSF000108&Pfam:PF00180&SMART:SM01329&Superfamily:SSF53659||||||||||||||||||||||||||||||||MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGRHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL GT:GQ:AD:DP:VF:NL:SB:NC 0/1:0:684,2:686:0.002915:18:-12.0416:0.0200
chr2 208245451 . GA G 100 R3x6 DP=666;CSQ=-|intron_variant|MODIFIER|IDH1|ENSG00000138413|Transcript|ENST00000345146|protein_coding||4/9|ENST00000345146.7:c.415-28del|||||||rs569424950||-1||deletion|HGNC|HGNC:5382|YES|NM_005896.4||1|P1|CCDS2381.1|ENSP00000260985|O75874.215|A0A024R3Y6.7|UPI000012D1B4||1||||||||||||0.01454|0.01345|0.002208|0.002132|0.001866|0.002006|0.002225|0.00063|0.002518|0.001787|0.002821|0.01454|AA|||||||||||MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGRHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL GT:GQ:AD:DP:VF:NL:SB:NC 0/1:100:619,47:666:0.070571:37:-100.0000:0.0000
The VCF tab:
#Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation IMPACT DISTANCE STRAND FLAGS VARIANT_CLASS SYMBOL SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL MANE_SELECT MANE_PLUS_CLINICAL TSL APPRIS CCDS ENSP SWISSPROT TREMBL UNIPARC UNIPROT_ISOFORM GENE_PHENO SIFT PolyPhen EXON INTRON DOMAINS miRNA HGVSc HGVSp HGVS_OFFSET AF AFR_AF AMR_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF gnomAD_AF gnomAD_AFR_AF gnomAD_AMR_AF gnomAD_ASJ_AF gnomAD_EAS_AF gnomAD_FIN_AF gnomAD_NFE_AF gnomAD_OTH_AF gnomAD_SAS_AF MAX_AF MAX_AF_POPS CLIN_SIG SOMATIC PHENO PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE TRANSCRIPTION_FACTORS FrameshiftSequence WildtypeProtein
chr2_208245343_T/C chr2:208245343 C ENSG00000138413 ENST00000345146 Transcript missense_variant 719 496 166 T/A Aca/Gca - MODERATE - -1 - SNV IDH1 HGNC HGNC:5382 protein_coding YES NM_005896.4 - 1 P1 CCDS2381.1 ENSP00000260985 O75874.215 A0A024R3Y6.7 UPI000012D1B4 - 1 tolerated_low_confidence(0.13) benign(0.001) 5/10 - PDB-ENSP_mappings:1t09.A,PDB-ENSP_mappings:1t09.B,PDB-ENSP_mappings:1t0l.A,PDB-ENSP_mappings:1t0l.B,PDB-ENSP_mappings:1t0l.C,PDB-ENSP_mappings:1t0l.D,PDB-ENSP_mappings:3inm.A,PDB-ENSP_mappings:6vei.B,PDB-ENSP_mappings:6vg0.A,PDB-ENSP_mappings:6vg0.B,PDB-ENSP_mappings:6vg0.C,PANTHER:PTHR11822,PANTHER:PTHR11822:SF28,TIGRFAM:TIGR00127,Gene3D:3.40.718.10,PIRSF:PIRSF000108,Pfam:PF00180,SMART:SM01329,Superfamily:SSF53659 - ENST00000345146.7:c.496A>G ENSP00000260985.2:p.Thr166Ala - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGRHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL
chr2_208245401_A/G chr2:208245401 G ENSG00000138413 ENST00000345146 Transcript synonymous_variant 661 438 146 V gtT/gtC - LOW - -1 - SNV IDH1 HGNC HGNC:5382 protein_coding YES NM_005896.4 - 1 P1 CCDS2381.1 ENSP00000260985 O75874.215 A0A024R3Y6.7 UPI000012D1B4 - 1 - - 5/10 - PDB-ENSP_mappings:1t09.A,PDB-ENSP_mappings:1t09.B,PDB-ENSP_mappings:1t0l.A,PDB-ENSP_mappings:1t0l.B,PDB-ENSP_mappings:6vg0.A,PDB-ENSP_mappings:6vg0.B,PDB-ENSP_mappings:6vg0.C,PANTHER:PTHR11822,PANTHER:PTHR11822:SF28,TIGRFAM:TIGR00127,Gene3D:3.40.718.10,PIRSF:PIRSF000108,Pfam:PF00180,SMART:SM01329,Superfamily:SSF53659 - ENST00000345146.7:c.438T>C ENSP00000260985.2:p.Val146%3D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGRHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL
chr2_208245402_A/G chr2:208245402 G ENSG00000138413 ENST00000345146 Transcript missense_variant 660 437 146 V/A gTt/gCt - MODERATE - -1 - SNV IDH1 HGNC HGNC:5382 protein_coding YES NM_005896.4 - 1 P1 CCDS2381.1 ENSP00000260985 O75874.215 A0A024R3Y6.7 UPI000012D1B4 - 1 tolerated_low_confidence(0.16) possibly_damaging(0.823) 5/10 - PDB-ENSP_mappings:1t09.A,PDB-ENSP_mappings:1t09.B,PDB-ENSP_mappings:1t0l.A,PDB-ENSP_mappings:1t0l.B,PDB-ENSP_mappings:1t0l.C,PDB-ENSP_mappings:1t0l.D,PDB-ENSP_mappings:3inm.A,PDB-ENSP_mappings:6vei.B,PDB-ENSP_mappings:6vg0.A,PDB-ENSP_mappings:6vg0.B,PDB-ENSP_mappings:6vg0.C,PANTHER:PTHR11822,PANTHER:PTHR11822:SF28,TIGRFAM:TIGR00127,Gene3D:3.40.718.10,PIRSF:PIRSF000108,Pfam:PF00180,SMART:SM01329,Superfamily:SSF53659 - ENST00000345146.7:c.437T>C ENSP00000260985.2:p.Val146Ala - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGRHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL
chr2_208245452_A/- chr2:208245452 - ENSG00000138413 ENST00000345146 Transcript intron_variant - - - - - rs569424950 MODIFIER - -1 - deletion IDH1 HGNC HGNC:5382 protein_coding YES NM_005896.4 - 1 P1 CCDS2381.1 ENSP00000260985 O75874.215 A0A024R3Y6.7 UPI000012D1B4 - 1 - - - 4/9 - - ENST00000345146.7:c.415-28del - - - - - - - - 0.01454 0.01345 0.002208 0.002132 0.001866 0.002006 0.002225 0.00063 0.002518 0.001787 0.002821 0.01454 AA - - - - - - - - - - MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGRHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL
It will be very helpfull if someone could give me some light. I have omitted all the commented rows from the VCF files. Is there any VEP plugin that can retain the original columns of the VCF file?
Thanks!!
if you have all the needed information in the VCF output, why would you need to join it with the tab file ?
Yes, but is to avoid to rewrite all the headers. And also (the main reason) is because the tab format is easier to visualize the data in excel for my co-workers.
Hey I really need your help regarding vcf file . Please can you help me