Edit a VCF file
1
0
Entering edit mode
7.9 years ago
natasha ▴ 110

Hi

I have a VCF file which hasn't distinguished between the chromosomes - rather #Chrom is the name of my reference strain. The genomes in my MSA only have 2 chromosomes, so I was wondering if there is a way to rename #Chrom before position 3,000,000 chr1 and after this position chr2?

Thanks!!

vcf chromosomes • 2.1k views
ADD COMMENT
0
Entering edit mode

Thank you, this works great!

However, because my vcf wasn't created based on chromosomes, all the positions for chr2 are 3,000,000 positions too high. Is there a way I can take 3,000,000 off every position for chromosome 2?

ADD REPLY
0
Entering edit mode

See my update.

ADD REPLY
0
Entering edit mode
7.9 years ago

It's not pretty, but something along these lines should work:

awk 'BEGIN{FS="\t"; OFS="\t"}{if(substr($0,1,1) == "#") {print $0} else {if($2<3000000){$1="chr1"}else{$1="chr2"}print $0}}' foo.vcf > foo.modified.vcf

Update: To chop 3 million off of the chr2 positions:

awk 'BEGIN{FS="\t"; OFS="\t"}{if(substr($0,1,1) == "#") {print $0} else {if($2<3000000){$1="chr1"}else{$1="chr2"; $2 -= 3000000}print $0}}' foo.vcf > foo.modified.vcf
ADD COMMENT
0
Entering edit mode

shorter:

awk 'BEGIN{FS="\t"; OFS="\t"} /^#/ {print; next;} {$1=($2<3000000?"chr1":"chr2");print $0}}' foo.vcf > foo.modified.vcf
ADD REPLY
0
Entering edit mode

I always forget that awk has a ternary operator. Too much coding in python I guess.

ADD REPLY

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6