Question: Reorder The Contig=<Id Of The Header Of Vcf File, How?
0
gravatar for Tonyzeng
5.6 years ago by
Tonyzeng300
Tonyzeng300 wrote:

HI, I have VCF file with header that I need to change the order of contig ID from

##contig=<ID=1,length=195471971>
##contig=<ID=10,length=130694993>
##contig=<ID=11,length=122082543>
##contig=<ID=12,length=120129022>
##contig=<ID=13,length=120421639>
##contig=<ID=14,length=124902244>
##contig=<ID=15,length=104043685>
##contig=<ID=16,length=98207768>
##contig=<ID=17,length=94987271>
##contig=<ID=18,length=90702639>
##contig=<ID=19,length=61431566>
##contig=<ID=2,length=182113224>
##contig=<ID=3,length=160039680>
##contig=<ID=4,length=156508116>
##contig=<ID=5,length=151834684>
##contig=<ID=6,length=149736546>
##contig=<ID=7,length=145441459>
##contig=<ID=8,length=129401213>
##contig=<ID=9,length=124595110>
##contig=<ID=X,length=171031299>

How can I change it to

##contig=<ID=10,length=130694993>
##contig=<ID=11,length=122082543>
##contig=<ID=12,length=120129022>
##contig=<ID=13,length=120421639>
##contig=<ID=14,length=124902244>
##contig=<ID=15,length=104043685>
##contig=<ID=16,length=98207768>
##contig=<ID=17,length=94987271>
##contig=<ID=18,length=90702639>
##contig=<ID=19,length=61431566>
##contig=<ID=1,length=195471971>
##contig=<ID=2,length=182113224>
##contig=<ID=3,length=160039680>
##contig=<ID=4,length=156508116>
##contig=<ID=5,length=151834684>
##contig=<ID=6,length=149736546>
##contig=<ID=7,length=145441459>
##contig=<ID=8,length=129401213>
##contig=<ID=9,length=124595110>
##contig=<ID=X,length=171031299>
vcf • 2.8k views
ADD COMMENTlink modified 5.6 years ago by Chris Miller20k • written 5.6 years ago by Tonyzeng300

That's not a BAM header. Do you mean VCF?

ADD REPLYlink written 5.6 years ago by Devon Ryan90k

Thank you for the reminding, Dpryan, I corrected it.

ADD REPLYlink written 5.6 years ago by Tonyzeng300

Do you need to reorder the whole file, or just the header lines? It's unclear from your question.

ADD REPLYlink written 5.6 years ago by Chris Miller20k

I need just reorder the header lines because the order of read lines have been modified perfectly, Thank you!

ADD REPLYlink written 5.6 years ago by Tonyzeng300

Huh!! I just wrote a code for you to order the read lines. Anyways, its a hightime for you to learn vi commands (http://www.cs.colostate.edu/helpdocs/vi.html). Use unix to edit the file if it is too big for any windows application like Notepad++,

ADD REPLYlink written 5.6 years ago by Ashutosh Pandey11k

Thanks, Ashutoshmits, I am sorry not to make it clear that I do generate a VCF file with the correct chromosome order to the READ LINES but not the header line. As for the header line of VCF file, I still need to reorder ##contig=<id=number. i="" assumed="" that="" the="" following="" code="" you="" posted="" works="" for="" order="" the="" read="" lines="" but="" not="" for="" the="" header="" line.="" <="" p="">

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Tonyzeng300

Ashutoshmits, I have done running Basecalibration of GATK without any modification of the order ##contig=, it has done with out any probelm. So I do not need to sort the header anymore.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Tonyzeng300

Cool. It means GATK doesnt care for the contig order in the header of a VCF file.

ADD REPLYlink written 5.6 years ago by Ashutosh Pandey11k

Oh yeah! Thank you so much for your help anyway, Ashutoshmits

ADD REPLYlink written 5.6 years ago by Tonyzeng300
0
gravatar for Ashutosh Pandey
5.6 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Here is the code that should work. You will have to manually change the order in header but it will take care of the remaining. Make sure your computer has enough RAM if you have a big VCF file.


import os, sys
Argument = []
Argument = sys.argv[1:]

if (len(Argument)) < 1:
        print "Usage:Input_vcf Outputfile"
        sys.exit()

output = open(Argument[1],"w")
input = open(Argument[0])

def numeric_compare(x, y):
        x1 = int(x)
        y1 = int(y)
        return x1 - y1
Chromosome = ["10","11","12","13","14","15","16","17","18","19","X","1","2","3","4","5","6","7","8","9"]
VCF = {}
for line in input:
        if line.startswith("#"):
                output.write(str(line))
                continue
        v = []
        v = line.strip("\n").split("\t")

        if v[0] not in VCF:
                VCF[v[0]] = {}
                VCF[v[0]][v[1]] = line
        else:
                VCF[v[0]][v[1]] = line
for chr in Chromosome:
        for pos in sorted(VCF[chr].keys(),cmp=numeric_compare):
                output.write(str(VCF[chr][pos]))
                output.flush()
output.close()
ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 892 users visited in the last hour