Question: Rename INFO tags in a VCF file
1
gravatar for Sean
4.2 years ago by
Sean180
United States
Sean180 wrote:

Question:

What is the simplest/fastest/safest way to rename a few INFO tags in a VCF file (especially if the file is already bgzipped)?

Background:

I'm trying to merge multiple VCF files, but some of them have conflicting INFO tag names (e.g. AC, AN, AF, etc.). I want to rename some of the INFO tags in the original (unmerged) files so they don't conflict with each other once they're merged.

What I've tried:

BCFtools is nice in that it's safe, flexible and works very quickly with bgzipped VCF files. I figured this simple task would be builtin, but I couldn't find it anywhere. I tried using bcftools reheader to rename some tags, but this only modifies the VCF header. The tags in the data remain the same. I'm open to using other tools (VCFtools, awk, sed, etc.), but I would prefer a method that is compatible with bgzipped files (if possible).

awk vcftools sed bcftools vcf • 2.3k views
ADD COMMENTlink modified 4.1 years ago • written 4.2 years ago by Sean180
6
gravatar for Sean
4.1 years ago by
Sean180
United States
Sean180 wrote:

Here's the best I could come up with for now:

bcftools view -O v variants.vcf.gz \
  | sed -e 's/\([;=[:space:]]\)EA_AC\([,;=[:space:]]\)/\1EVS_EA_AC\2/' \
        -e 's/\([;=[:space:]]\)AA_AC\([,;=[:space:]]\)/\1EVS_AA_AC\2/' \
  | bcftools view -O z -o variants.re-tagged.vcf.gz

Explanation:

  1. I'm using bcftools view -O v to open my compressed VCF file (in this case variants.vcf.gz). This command is nice because it also supports opening uncompressed VCF files (e.g. variants.vcf) as well as compressed/uncompressed BCF files (e.g. variants.bcf.gz or variants.bcf, respectively) without having to specify any extra flags.
  2. Then I'm using sed to rename my INFO tags (both in the header and the data). In this case I'm renaming EA_AC and AA_AC to EVS_EA_AC and EVS_AA_AC, respectively.
  3. Finally I'm using bcftools view -O z to write my output to a compressed VCF file (i.e. variants.re-tagged.vcf.gz). This command also allows you to write to an uncompressed VCF or compressed/uncompressed BCF. If you leave this step off then the uncompressed output will be written to standard out.
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Sean180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1684 users visited in the last hour