Rename INFO tags in a VCF file
1
2
Entering edit mode
6.1 years ago
Sean ▴ 270

Question:

What is the simplest/fastest/safest way to rename a few INFO tags in a VCF file (especially if the file is already bgzipped)?

Background:

I'm trying to merge multiple VCF files, but some of them have conflicting INFO tag names (e.g. AC, AN, AF, etc.). I want to rename some of the INFO tags in the original (unmerged) files so they don't conflict with each other once they're merged.

What I've tried:

BCFtools is nice in that it's safe, flexible and works very quickly with bgzipped VCF files. I figured this simple task would be builtin, but I couldn't find it anywhere. I tried using bcftools reheader to rename some tags, but this only modifies the VCF header. The tags in the data remain the same. I'm open to using other tools (VCFtools, awk, sed, etc.), but I would prefer a method that is compatible with bgzipped files (if possible).

vcf bcftools vcftools awk sed • 3.5k views
6
Entering edit mode
6.1 years ago
Sean ▴ 270

Here's the best I could come up with for now:

bcftools view -O v variants.vcf.gz \
| sed -e 's/$$[;=[:space:]]$$EA_AC$$[,;=[:space:]]$$/\1EVS_EA_AC\2/' \
-e 's/$$[;=[:space:]]$$AA_AC$$[,;=[:space:]]$$/\1EVS_AA_AC\2/' \
| bcftools view -O z -o variants.re-tagged.vcf.gz

Explanation:

1. I'm using bcftools view -O v to open my compressed VCF file (in this case variants.vcf.gz). This command is nice because it also supports opening uncompressed VCF files (e.g. variants.vcf) as well as compressed/uncompressed BCF files (e.g. variants.bcf.gz or variants.bcf, respectively) without having to specify any extra flags.
2. Then I'm using sed to rename my INFO tags (both in the header and the data). In this case I'm renaming EA_AC and AA_AC to EVS_EA_AC and EVS_AA_AC, respectively.
3. Finally I'm using bcftools view -O z to write my output to a compressed VCF file (i.e. variants.re-tagged.vcf.gz). This command also allows you to write to an uncompressed VCF or compressed/uncompressed BCF. If you leave this step off then the uncompressed output will be written to standard out.