vcf file column name error
1
0
Entering edit mode
3.2 years ago
evafinegan • 0

Hello,

I have a vcf file and I did not have any ID for each of the SNP in that column. So I manually added unique IDs to the SNPs using:

awk '{OFS="\t"} NR<67 {print $0;next} {{$3=$1"_"$2} print}' sample.vcf > out.vcf

but it also changed the column name from ID to #CHROM_POS. Now I am getting an error

Error in x@fix[, "ID"] : subscript out of bounds

in the downstream analysis. I think its the replaced column names that's causing the error. Is there a way to keep the column name to ID in the awk command line? Thank you!

sequencing • 771 views
ADD COMMENT
0
Entering edit mode
3.2 years ago

I have a vcf file and I did not have any ID for each of the SNP in that column.

bcftools annotate

Usage:   bcftools annotate [options] <in.vcf.gz>
(...)
   -I, --set-id [+]<format>       set ID column, see man page for details
(...)

if you really want awk:

awk '/^#/ {print;next} {OFS="\t";$3=sprintf("%s_%s",$1,$2); print}' sample.vcf
ADD COMMENT
0
Entering edit mode

Thank you! I used awk and now it gives this error: ID column contains non-unique names

ADD REPLY
0
Entering edit mode

because using cols CHROM and POS is not enough (duplicates...). Try $3=sprintf("%s_%s_%d",$1,$2,NR)

ADD REPLY

Login before adding your answer.

Traffic: 1295 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6