replace vcf file ID column
1
0
Entering edit mode
7.1 years ago
tanya.copley ▴ 10

Hi,

I am working with two very large vcf files (over 10 Gb each so copy pasting is too large and I want to include the function as part of a script for future studies) and need to replace the "ID" column variables in one of them in order to have matching IDs for merging. First, I removed all rows containing ## to have a simple matrix with no information liens. When I try replacing the column using awk (first by converting the vcf to a .txt file) (

awk 'FNR==NR{a[NR]=$3;next}{$3=a[FNR]}1' file2.txt file1.txt > output.txt 

and then converting back to a vcf), it does not work. When I remove the first 3 columns of the vcf and convert to a .txt and try using a simple

paste file2.txt file1.txt > output.txt

(where file2.txt is the CHROM, POS and new ID columns) and converting back to a vcf, the contents are not put in the same row, but rather one row after the other. So, I tried the following command afterwards to try to merge every other row together, but it is not working either (

awk '{getline b;printf("%s %s\n",$0,b)}' output.txt > final.txt

). Any help would be appreciated.

shell VCF python linux • 3.0k views
ADD COMMENT
0
Entering edit mode
7.1 years ago

save the header

grep "^#"  input.vcf > output.vcf

(where file2.txt is the CHROM, POS and new ID

create a CHROM_POS/ID and sort

awk '{printf("%s_%s\t%s\n",$1,$2,$3);}'  file2.txt | LC_ALL=C sort -k1,1 > sorted1.txt

create a pseudo key for the VCF and sort

grep -v '^#' input.vcf |  awk '{printf("%s_%s\t%s\n",$1,$2,$0);}'   LC_ALL=C sort -k1,1 > sorted2.txt

join and concatenate (I'm lazy: here you have to play with the join parameters/output to select/remove some column,s keep the orphan, check the 'join' manual )

join -t $'\t' -1 1 -2 1 sorted1.txt sorted2.txt | awk something >> output.vcf
ADD COMMENT
0
Entering edit mode

unfortunately this is still giving me the same problem with the two files being on different lines rather than being together on the same line. Thanks though

ADD REPLY
0
Entering edit mode

uhh ??? .....

ADD REPLY
0
Entering edit mode

Ya, I can't figure out why it's doing that. I ended up doing it in R- took forever, but it worked

ADD REPLY

Login before adding your answer.

Traffic: 3156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6