Question: Numeric genotype to letters
0
gravatar for waqaskhokhar999
8 months ago by
waqaskhokhar99980 wrote:

I have an input file like this:

#CHROM  START   ID  Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   0   0   0   1   0
3   11641   snp_3_11641 T   G   0   1   1   2   2
3   14240   snp_3_14240 G   A,T 0   0   0   0   1

From column 5 onwards if its 0 replace it with ref+ref, if its one replaces it with ref+alt and if its 2 replace it with alt+alt so that the above table should be like that:

#CHROM  START   ID  Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   CC  CC  CC  CT  CC
3   11641   snp_3_11641 T   G   TT  TG  TG  GG  GG
3   14240   snp_3_14240 G   A   GG  GG  GG  GG  GA
snp • 240 views
ADD COMMENTlink modified 8 months ago by JC9.5k • written 8 months ago by waqaskhokhar99980
3
gravatar for Pierre Lindenbaum
8 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:
 awk '/^#/{print;next;} {split($5,a,/[,]/);for(i=1;i<6;i++) printf("%s%s",i==1?"":"\t",$i);for(i=6;i<=NF;i++) { printf("\t"); if($i==0) printf("%s%s",$4,$4); else if($i==1) printf("%s%s",$4,a[1]); else if($i==2) printf("%s%s",a[1],a[1]); else printf("??");} printf("\n");} ' input.tsv

#CHROM START ID Ref Alt 108 139 159 265 350
5   5571    snp_5_5571  C   T   CC  CC  CC  CT  CC
3   11641   snp_3_11641 T   G   TT  TG  TG  GG  GG
3   14240   snp_3_14240 G   A,T GG  GG  GG  GG  GA
ADD COMMENTlink written 8 months ago by Pierre Lindenbaum126k
2
gravatar for benformatics
8 months ago by
benformatics1.4k
ETH Zurich
benformatics1.4k wrote:

I have no idea what the format of your object is but here is a potential solution in R.

Assuming you have your input file in R and it looks as follows (you can use read.delim()):

  X.CHROM START          ID Ref Alt X108 X139 X159 X265 X350
1       5  5571  snp_5_5571   C   T    0    0    0    1    0
2       3 11641 snp_3_11641   T   G    0    1    1    2    2
3       3 14240 snp_3_14240   G A,T    0    0    0    0    1

Then you can use the following function on this object (class data.frame) in the following way:

convertSNP <- function(var.row){
  ref <- var.row['Ref']
  alt <- var.row['Alt']
  ## keep only first variant
  alt <- gsub(",[AGTC]+","",alt)
  var.row[var.row == 0] <- paste0(ref,ref)
  var.row[var.row == 1] <- paste0(ref,alt)
  var.row[var.row == 2] <- paste0(alt,alt)
  ## get rid of whitespace
  var.row <- gsub(" ","",var.row)
  return(var.row)
}

## run function on data.frame by row and transpose the result
t(apply(df,1,convertSNP))

Giving the result:

     X.CHROM START   ID            Ref Alt   X108 X139 X159 X265 X350
[1,] "5"     "5571"  "snp_5_5571"  "C" "T"   "CC" "CC" "CC" "CT" "CC"
[2,] "3"     "11641" "snp_3_11641" "T" "G"   "TT" "TG" "TG" "GG" "GG"
[3,] "3"     "14240" "snp_3_14240" "G" "A,T" "GG" "GG" "GG" "GG" "GA"
ADD COMMENTlink modified 8 months ago • written 8 months ago by benformatics1.4k
1
gravatar for JC
8 months ago by
JC9.5k
Mexico
JC9.5k wrote:
$ perl -lae 'for ($i=5;$i<=$#F;$i++) { $F[$i]=$F[3]x2 if ($F[$i]==0); $F[$i]=$F[3].$F[4] if ($F[$i]==1); $F[$i]=$F[4]x2 if ($F[$i]==2); $F[$i]=~ s/,.+//; } print join "\t", @F' < in
#CHROM  START   ID      Ref     Alt     108     139     159     265     350
5       5571    snp_5_5571      C       T       CC      CC      CC      CT      CC
3       11641   snp_3_11641     T       G       TT      TG      TG      GG      GG
3       14240   snp_3_14240     G       A,T     GG      GG      GG      GG      GA
ADD COMMENTlink written 8 months ago by JC9.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1738 users visited in the last hour