Question: Script For R or perl
0
gravatar for alim.hcu
2.2 years ago by
alim.hcu0
alim.hcu0 wrote:

Dear Friends,

I want to bin my txt file into 100bp size based on position(column=2). And condition for bining is , there should be at least three cytosine position (column=7) in each 100 bp bin. Thank you

CcLG01  22  +   CHH 0.200   5.00    1   5   0   0   0.036   0.624
CcLG01  23  +   CHH 1.000   1.00    1   1   0   0   0.207   1.000
CcLG01  28  +   CHH 0.200   5.00    1   5   0   0   0.036   0.624
CcLG01  29  +   CHH 1.000   1.00    1   1   0   0   0.207   1.000
CcLG01  30  +   CHH 1.000   1.00    1   1   0   0   0.207   1.000
CcLG01  33  +   CHH 0.250   4.00    1   4   0   0   0.046   0.699
CcLG01  39  -   CHH 0.500   2.00    1   2   2   2   0.095   0.905
CcLG01  42  -   CHH 0.000   2.00    0   2   1   1   0.000   0.658
CcLG01  49  +   CHH 0.000   1.00    0   1   3   3   0.000   0.793
CcLG01  60  -   CHH 0.000   3.00    0   3   1   1   0.000   0.562
CcLG01  61  +   CHH 0.000   1.00    0   1   3   3   0.000   0.793
CcLG01  62  +   CHH 0.000   1.00    0   1   3   3   0.000   0.793
CcLG01  72  -   CHH 0.000   4.00    0   4   1   1   0.000   0.490
CcLG01  75  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  80  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  94  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  97  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  99  -   CHH 0.000   4.00    0   4   0   0   0.000   0.490
CcLG01  116 -   CHH 0.000   2.00    0   2   0   0   0.000   0.658
CcLG01  126 -   CHH 0.000   1.00    0   1   0   0   0.000   0.793
CcLG01  133 -   CHH 0.000   1.00    0   1   0   0   0.000   0.793
CcLG01  199 -   CHH 0.000   1.00    0   1   0   0   0.000   0.793
next-gen R • 841 views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by alim.hcu0

Sorry,you description is faint,maybe you can give an example. I think Perl is more suitable for your tesk.

ADD REPLYlink written 2.2 years ago by zjhzwang180

Dear zjhzwang

Please find the attached example.

ADD REPLYlink written 2.2 years ago by alim.hcu0

Fine,I know that,but what output format you want?

ADD REPLYlink written 2.2 years ago by zjhzwang180

i want to add column 7 based on the column 2. But the addition should be based on a another file which have start and end position

like I have a another file2 (position information) in this format chr start end CcLG01 1 45 CcLg 58 135 So based on start and end position given in file 2. column 7 should be added in file1 (mentioned above)

ADD REPLYlink written 2.2 years ago by alim.hcu0
0
gravatar for jinych2bgi
2.2 years ago by
jinych2bgi20
jinych2bgi20 wrote:

i think you want to find the bin,that have at least 3 methylation cytosines in each 100bp bin? so perl exmaple:

while (<IN>) {

 chomp;

  my @tmp = split;

  $flag = int($tmp[1]/100)+1;

  if ($tmp[6] == 1) {

    push @db{$flag},"$tmp[1]";

  }

}

print "#flag\tstart\tend\tnumber of methylation\tmethylation positon\n"; 

foreach my $k (sort{$a<=>$b} keys %db) {

  my $s = ($k-1)*100+1;

  my $e = $k*100;

  my $num = scalar(@{$db{$k}});

  print "$k\t$s\t$e\t$num\t",join(",",@{db{$k}}),"\n";

}
<h6>#</h6>

you can filter the result by youself!!!!

ADD COMMENTlink modified 2.2 years ago by WouterDeCoster36k • written 2.2 years ago by jinych2bgi20

you might need to format your code using 4-spaces indent. learn more about markdown format.

ADD REPLYlink written 2.2 years ago by shenwei3564.5k
0
gravatar for zjhzwang
2.2 years ago by
zjhzwang180
zjhzwang180 wrote:

You can use grep() function to find the indexes of one chr.

data<-read.table("PATH",
                 header=F,
                 sep="\t",
                 )
chr<-as.character(levels(data[,1]))
bin<-function(chr){
  index<-grep(chr,data[,1])
  start<-data[index[1],2]
  end<-data[index[length(index)],2]
  position<-start-end
  cytosine<-sum(data[index,7])
  coverage<-sum(data[index,8])
  if(position>100 & cytosine>3){
    return(c(chr,start,end,cytosine,coverage))
  }
}
result<-t(apply(matrix(chr,ncol=1),1,bin))
colnames(result)<-c("chr","shart","end","cytosine","coverage")

The result is :

> result
     chr      shart end  cytosine coverage
[1,] "CcLG01" "22" "199" "7"      "56"

And you can output:

write.table(result,"PATH",col.names = T,row.names = F,sep="\t",quote=F)
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by zjhzwang180

Dear zjhzwang,

Thanks for you prompt reply. Actually I have to add coulmn 7 based on position column(2). And addition of column 7 should be in 100 bp range. like CcLG01 114 - CG 0.000 2.00 0 2 0 0 0.000 0.658 CcLG01 136 - CG 0.000 1.00 0 1 0 0 0.000 0.793 CcLG01 243 - CG 0.000 1.00 0 1 0 0 0.000 0.793 CcLG01 1272 + CG 0.000 1.00 0 1 1 1 0.000 0.793 CcLG01 1273 - CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1277 + CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1278 - CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1281 + CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1282 - CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1287 + CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1288 - CG 1.000 1.00 1 1 1 1 0.207 1.000 CcLG01 1296 + CG 1.000 1.00 1 1 0 0 0.207 1.000

like postion 114 ans 136 (it will be 100-200 base pair bin). But only two CG (Col=4) is there in between these position so it should be discarded cause minimum number of CG required to be qualified is = 3. from 1200 -1300 bin (100 bp) there are 9 CG so this will qualified And my output will be like that

Chr start end Cytosine(Sum of Col 7) coverage(sum of column 8) CcLG01 1200 1300 8 9

Thank You for your kind support

ADD REPLYlink written 2.2 years ago by alim.hcu0

I have update the answer,but I'm not sure: if have [CcLG01 200 500 8 9] ,[CcLG01 200 300 3 6] and [CcLG01 400 500 5 3],should it be split?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by zjhzwang180

Dear zjhzwang

Thank You for your time and prompt reply. It worked.

thank you again

ADD REPLYlink written 2.2 years ago by alim.hcu0

Dear zjhzwang

When i used t(apply command .. i got following error.

result <- t(apply(matrix(chr,ncol=1),1,bin))

data=read.table("Sterile_05.txt", header=F, sep="\t")
+   index&lt;-grep(chr,data[,1])
+   start&lt;-data[index[1],2]
+   end&lt;-data[index[length(index)],2]
+   position&lt;-start-end
+   cytosine&lt;-sum(data[index,7])
+   coverage&lt;-sum(data[index,8])
+   if(position&gt;100 &amp; cytosine&gt;3){
+     return(c(chr,start,end,cytosine,coverage))
+   }
+ }

Error in t.default(apply(matrix(chr, ncol = 1), 1, bin)) : argument is not a matrix Error: unexpected ')' in "result<-apply(matrix(chr,ncol=1),1,bin))"

ADD REPLYlink modified 2.2 years ago by WouterDeCoster36k • written 2.2 years ago by alim.hcu0

And when i used same data which you have used earlier . i got this error.

data=read.table("Sterile_05.txt", header=F, sep="\t")
+   index&lt;-grep(chr,data[,1])
+   start&lt;-data[index[1],2]
+   end&lt;-data[index[length(index)],2]
+   position&lt;-start-end
+   cytosine&lt;-sum(data[index,7])
+   coverage&lt;-sum(data[index,8])
+   if(position&gt;100 &amp; cytosine&gt;3){
+     return(c(chr,start,end,cytosine,coverage))
+   }
+ }
ADD REPLYlink modified 2.2 years ago by WouterDeCoster36k • written 2.2 years ago by alim.hcu0
0
gravatar for alim.hcu
2.2 years ago by
alim.hcu0
alim.hcu0 wrote:

Dear zjhzwang, when i used " result<-t(apply(matrix(chr,ncol=1),1,bin))" command..i got error.

data=read.table("Sterile_05.txt", header=F, sep="\t")
> chr<-as.character(levels(data[,1]))
> bin<-function(chr){
+   index<-grep(chr,data[,1])
+   start<-data[index[1],2]
+   end<-data[index[length(index)],2]
+   position<-start-end
+   cytosine<-sum(data[index,7])
+   coverage<-sum(data[index,8])
+   if(position>100 & cytosine>3){
+     return(c(chr,start,end,cytosine,coverage))
+   }
+ }

> result<-t(apply(matrix(chr,ncol=1),1,bin))
Error in t.default(apply(matrix(chr, ncol = 1), 1, bin)) : 
  argument is not a matrix
> result<-apply(matrix(chr,ncol=1),1,bin))
Error: unexpected ')' in "result<-apply(matrix(chr,ncol=1),1,bin))"
> result<-apply(matrix(chr,ncol=1),1,bin)
> result
NULL
ADD COMMENTlink modified 2.2 years ago by WouterDeCoster36k • written 2.2 years ago by alim.hcu0

Sorry,I just see it.
Maybe you can check what chr is,It should be a vector,if not,maybe something is wrong.

ADD REPLYlink written 2.2 years ago by zjhzwang180
0
gravatar for alim.hcu
2.2 years ago by
alim.hcu0
alim.hcu0 wrote:

Dear zjhzwang

One more question i want to add column 7 based on the column 2. But the addition should be based on a another file which have start and end position

like I have a another file in this format chr start end CcLG01 1 45

So from position 1 to 45 in another file (previous, mentioned above) column 7 should be added.

Thank You

ADD COMMENTlink written 2.2 years ago by alim.hcu0

Please use ADD COMMENT or ADD REPLY to reply to earlier answers, as such this thread remains logically structured and easy to follow.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by WouterDeCoster36k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 654 users visited in the last hour