Reorder The Contig=<Id Of The Header Of Vcf File, How?
1
0
Entering edit mode
8.6 years ago
Tonyzeng ▴ 310

HI, I have VCF file with header that I need to change the order of contig ID from

##contig=<ID=1,length=195471971>
##contig=<ID=10,length=130694993>
##contig=<ID=11,length=122082543>
##contig=<ID=12,length=120129022>
##contig=<ID=13,length=120421639>
##contig=<ID=14,length=124902244>
##contig=<ID=15,length=104043685>
##contig=<ID=16,length=98207768>
##contig=<ID=17,length=94987271>
##contig=<ID=18,length=90702639>
##contig=<ID=19,length=61431566>
##contig=<ID=2,length=182113224>
##contig=<ID=3,length=160039680>
##contig=<ID=4,length=156508116>
##contig=<ID=5,length=151834684>
##contig=<ID=6,length=149736546>
##contig=<ID=7,length=145441459>
##contig=<ID=8,length=129401213>
##contig=<ID=9,length=124595110>
##contig=<ID=X,length=171031299>


How can I change it to

##contig=<ID=10,length=130694993>
##contig=<ID=11,length=122082543>
##contig=<ID=12,length=120129022>
##contig=<ID=13,length=120421639>
##contig=<ID=14,length=124902244>
##contig=<ID=15,length=104043685>
##contig=<ID=16,length=98207768>
##contig=<ID=17,length=94987271>
##contig=<ID=18,length=90702639>
##contig=<ID=19,length=61431566>
##contig=<ID=1,length=195471971>
##contig=<ID=2,length=182113224>
##contig=<ID=3,length=160039680>
##contig=<ID=4,length=156508116>
##contig=<ID=5,length=151834684>
##contig=<ID=6,length=149736546>
##contig=<ID=7,length=145441459>
##contig=<ID=8,length=129401213>
##contig=<ID=9,length=124595110>
##contig=<ID=X,length=171031299>

vcf • 4.6k views
0
Entering edit mode

That's not a BAM header. Do you mean VCF?

0
Entering edit mode

Thank you for the reminding, Dpryan, I corrected it.

0
Entering edit mode

Do you need to reorder the whole file, or just the header lines? It's unclear from your question.

0
Entering edit mode

I need just reorder the header lines because the order of read lines have been modified perfectly, Thank you!

0
Entering edit mode

Huh!! I just wrote a code for you to order the read lines. Anyways, its a hightime for you to learn vi commands (http://www.cs.colostate.edu/helpdocs/vi.html). Use unix to edit the file if it is too big for any windows application like Notepad++,

0
Entering edit mode

Thanks, Ashutoshmits, I am sorry not to make it clear that I do generate a VCF file with the correct chromosome order to the READ LINES but not the header line. As for the header line of VCF file, I still need to reorder ##contig=<id=number. i="" assumed="" that="" the="" following="" code="" you="" posted="" works="" for="" order="" the="" read="" lines="" but="" not="" for="" the="" header="" line.="" <="" p="">

0
Entering edit mode

Ashutoshmits, I have done running Basecalibration of GATK without any modification of the order ##contig=, it has done with out any probelm. So I do not need to sort the header anymore.

0
Entering edit mode

Cool. It means GATK doesnt care for the contig order in the header of a VCF file.

0
Entering edit mode

Oh yeah! Thank you so much for your help anyway, Ashutoshmits

0
Entering edit mode
8.6 years ago

Here is the code that should work. You will have to manually change the order in header but it will take care of the remaining. Make sure your computer has enough RAM if you have a big VCF file.


import os, sys
Argument = []
Argument = sys.argv[1:]

if (len(Argument)) < 1:
print "Usage:Input_vcf Outputfile"
sys.exit()

output = open(Argument[1],"w")
input = open(Argument[0])

def numeric_compare(x, y):
x1 = int(x)
y1 = int(y)
return x1 - y1
Chromosome = ["10","11","12","13","14","15","16","17","18","19","X","1","2","3","4","5","6","7","8","9"]
VCF = {}
for line in input:
if line.startswith("#"):
output.write(str(line))
continue
v = []
v = line.strip("\n").split("\t")

if v[0] not in VCF:
VCF[v[0]] = {}
VCF[v[0]][v[1]] = line
else:
VCF[v[0]][v[1]] = line
for chr in Chromosome:
for pos in sorted(VCF[chr].keys(),cmp=numeric_compare):
output.write(str(VCF[chr][pos]))
output.flush()
output.close()