How to add to a VCF file's sample information into INFO column?
1
0
Entering edit mode
22 months ago

How do I add sample.id into the INFO column the way the below example does it?

recreate input/output data:

input:

  #CHROM     POS          ID REF ALT               INFO sample.id
1      1 1724503  rs33952351   C  CA  AC=2;AF=1.00;AN=2       in1
2      1 1749412 rs115507312   T   G AC=1;AF=0.500;AN=2       in1
3      1 2488153      rs4870   A   G AC=1;AF=0.500;AN=2       in1
4      1   24503           .   C  CA  AC=2;AF=1.00;AN=2       in2
5      1 1749412 rs115507312   T   G AC=2;AF=0.500;AN=2       in2
6     22 2488153       rsid2   A   G AC=1;AF=0.500;AN=2       in2
7      1 1724503           .   C   T  AC=2;AF=1.00;ZZ=T       in3
8      1 1749412 rs115507312   T   G AC=1;AF=0.500;ZZ=F       in3
9      1 2488153      rs4870   A   G AC=1;AF=0.500;ZZ=T       in3

desired output:

  #CHROM     POS          ID REF ALT                                           INFO
1      1   24503           .   C  CA               AC=2;AF=1.00;AN=2;Identified=in2
2      1 1724503  rs33952351   C  CA               AC=2;AF=1.00;AN=2;Identified=in1
3      1 1749412 rs115507312   T   G AC=1;AF=0.500;AN=2;Identified=in1,in2,in3;ZZ=F
4      1 1724503           .   C   T               AC=2;AF=1.00;ZZ=T;Identified=in3
5      1 2488153      rs4870   A   G     AC=1;AF=0.500;AN=2;Identified=in1,in3;ZZ=T
6     22 2488153       rsid2   A   G              AC=1;AF=0.500;AN=2;Identified=in2

my code:

uid<-paste(file$`#CHROM`,file$POS,file$REF,file$ALT,sep="_")
INFO<-strsplit(file$INFO,split = ";")
for (i in 1:6){
  for (j in 1:9){
    samples<-unique(file$sample.id[which(file$uid == uid[i])])
    if (length(samples) > 1) {
      INFO[[j]]<-c(INFO[[j]],paste0('Identified=',paste(samples,collapse = ",")))
    } else if (length(samples) == 1) {
      INFO[[j]]<-c(INFO[[j]],paste0('Identified=',samples))
    } else {
      break }
    INFO[[j]]<-sort(INFO[[j]])
    INFO[[j]]<-paste(INFO[[j]],collapse=";")
    file$INFO[j]<-INFO[[j]]
  }
}

I can't figure the logic out, and whatever I've tried resulted in this output:

[7] "AC=2;AF=1.00;Identified=in1;ZZ=T;Identified=in1,in2,in3;Identified=in1,in3;Identified=in2;Identified=in2;Identified=in3" 
[8] "AC=1;AF=0.500;Identified=in1;ZZ=F;Identified=in1,in2,in3;Identified=in1,in3;Identified=in2;Identified=in2;Identified=in3"
[9] "AC=1;AF=0.500;Identified=in1;ZZ=T;Identified=in1,in2,in3;Identified=in1,in3;Identified=in2;Identified=in2;Identified=in3"
VCF R • 355 views
ADD COMMENT

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6