Changing the header of a VCF file and the field correspondingly
4
2
Entering edit mode
8.6 years ago
charlesberkn ▴ 40

Dear All,

I have a VCF file and I want to change part of the header:

##contig=<ID=1,length=195471971>
##contig=<ID=10,length=130694993>
##contig=<ID=11,length=122082543>
##contig=<ID=12,length=120129022>
##contig=<ID=13,length=120421639>
##contig=<ID=14,length=124902244>
##contig=<ID=15,length=104043685>
##contig=<ID=16,length=98207768>
##contig=<ID=17,length=94987271>
##contig=<ID=18,length=90702639>
##contig=<ID=19,length=61431566>
##contig=<ID=2,length=182113224>
##contig=<ID=3,length=160039680>
##contig=<ID=4,length=156508116>
##contig=<ID=5,length=151834684>
##contig=<ID=6,length=149736546>
##contig=<ID=7,length=145441459>
##contig=<ID=8,length=129401213>
##contig=<ID=9,length=124595110>
##contig=<ID=MT,length=16299>
##contig=<ID=X,length=171031299>
##contig=<ID=Y,length=91744698>

to

##contig=<ID=chr1,length=195471971>
##contig=<ID=chr10,length=130694993>
##contig=<ID=chr11,length=122082543>
##contig=<ID=chr12,length=120129022>
##contig=<ID=chr13,length=120421639>
##contig=<ID=chr14,length=124902244>
##contig=<ID=chr15,length=104043685>
##contig=<ID=chr16,length=98207768>
##contig=<ID=chr17,length=94987271>
##contig=<ID=chr18,length=90702639>
##contig=<ID=chr19,length=61431566>
##contig=<ID=chr2,length=182113224>
##contig=<ID=chr3,length=160039680>
##contig=<ID=chr4,length=156508116>
##contig=<ID=chr5,length=151834684>
##contig=<ID=chr6,length=149736546>
##contig=<ID=chr7,length=145441459>
##contig=<ID=chr8,length=129401213>
##contig=<ID=chr9,length=124595110>
##contig=<ID=chrM,length=16299>
##contig=<ID=chrX,length=171031299>
##contig=<ID=chrY,length=91744698>

and also change the field of #CHROM correspondingly.

I am wondering if there are any tools or easy ways to achieve this. Thank you in advance.

SNP • 11k views
ADD COMMENT
8
Entering edit mode
8.6 years ago

I have not tested/validated the commands, but think you need two steps: 1) change the header(s)

bcftools view --header-only $INPUT_FILE | sed 's/##contig=<ID=/##contig=<ID=chr/' | sed 's/##contig=<ID=chrMT/##contig=<ID=chrM/' > $OUTPUT_FILE

and 2) change the data field (as questioned by @charlesberkn).

bcftools view --no-header $INPUT_FILE | sed 's/^/chr/' | sed 's/^chrMT/chrM/' >> $OUTPUT_FILE

PS: Thanks to Pierre for pointing toward the special handling of chrM(T)!

ADD COMMENT
0
Entering edit mode

Thank you so much. Your code works great.

ADD REPLY
0
Entering edit mode

If the code works, please up-vote and set the answer as "accepted" to let other people with the same problem know this is working solution - thanks

ADD REPLY
0
Entering edit mode
8.6 years ago
davecap • 0

You can use a command line tool like `sed` to find/replace, or you can use something like PyVCF if you know python.

Example sed command:

sed -i 'backup' 's/ID\=/ID\=chr/g' file.vcf

Do you also want to change the first column in all the rows too?

ADD COMMENT
0
Entering edit mode

Yes, how can I do it? Thank you.

ADD REPLY
0
Entering edit mode

I tried your code, but I got this:

sed: can't find label for jump to `ackup'

Could you please help me out? Thank you.

ADD REPLY
0
Entering edit mode

beware with the chromosome MT, I would say

sed -ibackup -e 's/##contig=<ID=chrM,/##contig=<ID=MT/' -e 's/##contig=<ID=chr/##contig=<ID=/'

ah, yes , you have to change the column CHROM too...

ADD REPLY
1
Entering edit mode

I think this is the wrong direction of transformation, is it?

ADD REPLY
0
Entering edit mode
8.5 years ago
moorem ▴ 240

Little python script would do it:

import os
import re

for file_name in os.listdir("."): #Open directory (working)
    if file_name.endswith(".vcf"): #Read in all fasta file names
        file_list = re.split(r"[.]", file_name)
        file_name_without_vcf =(file_list[0])
        vcf_content =open(file_name)
        output = open(file_name_without_vcf + "_edit.vcf", "w")
        for line in vcf_content:
            if line.startswith("##contig"):
                output.write(line.replace("ID=", "ID=chr"))
            else:
                output.write(line)
ADD COMMENT
0
Entering edit mode
2.2 years ago
Ram 43k

Here's an updated answer for people with the same query: Create a tab-delimited file mapping old to new contig names and then use bcftools annotate --rename-chrs.

bcftools view -H file.vcf | grep "^##contig" | sed -Ee 's/##contig=<ID=([^,]+),.*/\1/' | sed 's/^MT$/M/' > old_contigs.txt
paste old_contigs.txt <(sed -Ee 's/^/chr/' old_contigs.txt) > contig_map.tsv
bcftools annotate --rename-chrs contig_map.tsv file.vcf > output.vcf
rm old_contigs.txt contig_map.tsv
ADD COMMENT

Login before adding your answer.

Traffic: 2541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6