Entering edit mode
4.7 years ago
evafinegan
•
0
Hello,
I have a vcf file and I did not have any ID for each of the SNP in that column. So I manually added unique IDs to the SNPs using:
awk '{OFS="\t"} NR<67 {print $0;next} {{$3=$1"_"$2} print}' sample.vcf > out.vcf
but it also changed the column name from ID to #CHROM_POS. Now I am getting an error
Error in x@fix[, "ID"] : subscript out of bounds
in the downstream analysis. I think its the replaced column names that's causing the error. Is there a way to keep the column name to ID in the awk command line? Thank you!
Thank you! I used awk and now it gives this error: ID column contains non-unique names
because using cols CHROM and POS is not enough (duplicates...). Try
$3=sprintf("%s_%s_%d",$1,$2,NR)