Splitting info field in Annovar's multianno.txt file
Entering edit mode
8 months ago

Having some trouble splitting my gnomAD database info field from the vcf info field in my ANNOVAR multianno.txt file. I had to use bcftools to merge the database annotation into the annovar input vcf to avoid the problem of annovar only outputting frequency data.

Here are some examples of the entries in the column I'm having trouble with. Columns are tab separated, so I am trying to essentially insert a tab at specific points in these entries.

CONTQ=93;DP=555;ECNT=4;MBQ=30,20,30;MFRL=181,172,212;MMQ=60,60,60;MPOS=6,21;OCM=0;POPAF=2.4,2.4;SEQQ=93;STRANDQ=93;TLOD=19.94,1805.91;qual=-10;filters=artifact_prone_site;*(etc etc etc etc)*

CONTQ=93;DP=801;ECNT=5;MBQ=30,10;MFRL=190,230;MMQ=60,60;MPOS=18;OCM=0;POPAF=2.4;SEQQ=2;STRANDQ=1;TLOD=3.34;qual=-10;filters=npg;*(etc etc etc)*


Everything to the right of the TLOD= entry is gnomAD data. As you can see, sometimes there is no gnomAD entry, and sometimes TLOD= has multiple values, so I'm struggling to craft an effective regex in sed/awk.

Is there a simple programmatic way to do this? Or better yet, is there a way to get bcftools to put the gnomad data in its own info column before it goes through annovar?

This is my bcftools input:

bcftools annotate --force -a ./db.vcf.gz -c INFO ./input.vcf.gz > ./output.vcf
Annovar sed awk bcftools • 311 views
Entering edit mode

You could try to standardize the info fields in your VCF file before annotating it with Annovar. Maybe something like

bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\tCONTQ=%CONTQ;DP=%DP;ECNT=%ECNT;MBQ=%MBQ;MFRL=%MFRL;MMQ=%MMQ;MPOS=%MPOS;OCM=%OCM;POPAF=%POPAF;SEQQ=%SEQQ;STRANDQ=%STRANDQ;TLOD=%TLOD;qual=%qual;filters=%filters;\n' input.vcf >> output.vcf


Login before adding your answer.

Traffic: 1394 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6