replace vcf file ID column
1
0
Entering edit mode
5.2 years ago
tanya.copley ▴ 10

Hi, I am working with two very large vcf files (over 10 Gb each so copy pasting is too large and I want to include the function as part of a script for future studies) and need to replace the "ID" column variables in one of them in order to have matching IDs for merging. First, I removed all rows containing ## to have a simple matrix with no information liens. When I try replacing the column using awk (first by converting the vcf to a .txt file) (

awk 'FNR==NR{a[NR]=$3;next}{$3=a[FNR]}1' file2.txt file1.txt > output.txt


and then converting back to a vcf), it does not work. When I remove the first 3 columns of the vcf and convert to a .txt and try using a simple

paste file2.txt file1.txt > output.txt


(where file2.txt is the CHROM, POS and new ID columns) and converting back to a vcf, the contents are not put in the same row, but rather one row after the other. So, I tried the following command afterwards to try to merge every other row together, but it is not working either (

awk '{getline b;printf("%s %s\n",$0,b)}' output.txt > final.txt  ). Any help would be appreciated. VCF replace column shell linux phython • 2.4k views ADD COMMENT 0 Entering edit mode 5.2 years ago save the header grep "^#" input.vcf > output.vcf  (where file2.txt is the CHROM, POS and new ID create a CHROM_POS/ID and sort awk '{printf("%s_%s\t%s\n",$1,$2,$3);}'  file2.txt | LC_ALL=C sort -k1,1 > sorted1.txt


create a pseudo key for the VCF and sort

grep -v '^#' input.vcf |  awk '{printf("%s_%s\t%s\n",$1,$2,$0);}' LC_ALL=C sort -k1,1 > sorted2.txt  join and concatenate (I'm lazy: here you have to play with the join parameters/output to select/remove some column,s keep the orphan, check the 'join' manual ) join -t$'\t' -1 1 -2 1 sorted1.txt sorted2.txt | awk something >> output.vcf

0
Entering edit mode

unfortunately this is still giving me the same problem with the two files being on different lines rather than being together on the same line. Thanks though

0
Entering edit mode

uhh ??? .....

0
Entering edit mode

Ya, I can't figure out why it's doing that. I ended up doing it in R- took forever, but it worked