Question: Numeric genotype to letters
0
gravatar for waqaskhokhar999
19 months ago by
waqaskhokhar999100 wrote:

I have an input file like this:

#CHROM  START   ID  Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   0   0   0   1   0
3   11641   snp_3_11641 T   G   0   1   1   2   2
3   14240   snp_3_14240 G   A,T 0   0   0   0   1

From column 5 onwards if its 0 replace it with ref+ref, if its one replaces it with ref+alt and if its 2 replace it with alt+alt so that the above table should be like that:

#CHROM  START   ID  Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   CC  CC  CC  CT  CC
3   11641   snp_3_11641 T   G   TT  TG  TG  GG  GG
3   14240   snp_3_14240 G   A   GG  GG  GG  GG  GA
snp • 415 views
ADD COMMENTlink modified 19 months ago by JC12k • written 19 months ago by waqaskhokhar999100
3
gravatar for Pierre Lindenbaum
19 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:
 awk '/^#/{print;next;} {split($5,a,/[,]/);for(i=1;i<6;i++) printf("%s%s",i==1?"":"\t",$i);for(i=6;i<=NF;i++) { printf("\t"); if($i==0) printf("%s%s",$4,$4); else if($i==1) printf("%s%s",$4,a[1]); else if($i==2) printf("%s%s",a[1],a[1]); else printf("??");} printf("\n");} ' input.tsv

#CHROM START ID Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   CC  CC  CC  CT  CC
3   11641   snp_3_11641 T   G   TT  TG  TG  GG  GG
3   14240   snp_3_14240 G   A,T GG  GG  GG  GG  GA
ADD COMMENTlink written 19 months ago by Pierre Lindenbaum133k
2
gravatar for benformatics
19 months ago by
benformatics2.0k
ETH Zurich
benformatics2.0k wrote:

I have no idea what the format of your object is but here is a potential solution in R.

Assuming you have your input file in R and it looks as follows (you can use read.delim()):

  X.CHROM START          ID Ref Alt X108 X139 X159 X265 X350
1       5  5571  snp_5_5571   C   T    0    0    0    1    0
2       3 11641 snp_3_11641   T   G    0    1    1    2    2
3       3 14240 snp_3_14240   G A,T    0    0    0    0    1

Then you can use the following function on this object (class data.frame) in the following way:

convertSNP <- function(var.row){
  ref <- var.row['Ref']
  alt <- var.row['Alt']
  ## keep only first variant
  alt <- gsub(",[AGTC]+","",alt)
  var.row[var.row == 0] <- paste0(ref,ref)
  var.row[var.row == 1] <- paste0(ref,alt)
  var.row[var.row == 2] <- paste0(alt,alt)
  ## get rid of whitespace
  var.row <- gsub(" ","",var.row)
  return(var.row)
}

## run function on data.frame by row and transpose the result
t(apply(df,1,convertSNP))

Giving the result:

     X.CHROM START   ID            Ref Alt   X108 X139 X159 X265 X350
[1,] "5"     "5571"  "snp_5_5571"  "C" "T"   "CC" "CC" "CC" "CT" "CC"
[2,] "3"     "11641" "snp_3_11641" "T" "G"   "TT" "TG" "TG" "GG" "GG"
[3,] "3"     "14240" "snp_3_14240" "G" "A,T" "GG" "GG" "GG" "GG" "GA"
ADD COMMENTlink modified 19 months ago • written 19 months ago by benformatics2.0k
1
gravatar for JC
19 months ago by
JC12k
Mexico
JC12k wrote:
$ perl -lae 'for ($i=5;$i<=$#F;$i++) { $F[$i]=$F[3]x2 if ($F[$i]==0); $F[$i]=$F[3].$F[4] if ($F[$i]==1); $F[$i]=$F[4]x2 if ($F[$i]==2); $F[$i]=~ s/,.+//; } print join "\t", @F' < in
#CHROM  START   ID      Ref     Alt     108     139     159     265     350
5       5571    snp_5_5571      C       T       CC      CC      CC      CT      CC
3       11641   snp_3_11641     T       G       TT      TG      TG      GG      GG
3       14240   snp_3_14240     G       A,T     GG      GG      GG      GG      GA
ADD COMMENTlink written 19 months ago by JC12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 918 users visited in the last hour
_