Question: Counting number of base pairs and features in each bed file
0
gravatar for Ron
5 weeks ago by
Ron910
United States
Ron910 wrote:

Hello all,

Is there any way to count the number of base pairs in each individual bed file?

I know we can do this for the intersecting bed files,but want to to do this separately .

bedtools intersectBed -a file1.bed -b file2.bed -wo

below is the output from the two bed files using intersect and counting number of base pairs from that. However i also want to count the number of total base pairs in each of those bed files too.(per interval and then can sum those up)

chr1  69028   69391   ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547   chr1    69090   70008   301
chr1  69432   69630   ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547   chr1    69090   70008   198
chr1  69677   69961   ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547   chr1    69090   70008   284
chr1  621055  622013  ref|OR4F3,ref|OR4F29,ref|OR4F16,ref|NM_001005221,ref|NM_001005224,ref|NM_001005277,ens|ENST00000440200,ens|ENST00000332831,ccds|CCDS41221   chr1    621095  622034  918
chr1  861071  861574  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000437963,ccds|CCDS2 chr1    861321  861393  72
chr1  865582  865885  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000341065,ens|ENST00000437963,ccds|CCDS2 chr1    865534  865716  134
chr1  866331  866507  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000341065,ens|ENST00000437963,ccds|CCDS2 chr1    866418  866469  51
chr1  871064  871262  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000341065,ccds|CCDS2 chr1    871151  871276  111
chr1  874294  874969  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000455979,ens|ENST00000341065,ccds|CCDS2 chr1    874419  874509  90
chr1  874294  874969  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000455979,ens|ENST00000341065,ccds|CCDS2 chr1    874654  874840  186
  

Thanks

Ron

rna-seq ngs • 116 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Ron910

Can you give us a few sample lines of input and expected output? I don't understand what you mean by counting base pairs and features in a file that contains just contig names and coordinates.

ADD REPLYlink written 5 weeks ago by RamRS20k

Hi RamRS, I updated my question. I just want to look at the number of base pairs in each bed files.(not features, i Updated it)

ADD REPLYlink written 5 weeks ago by Ron910
2

It's a tab separated file, you can simply pass the intersect result through awk and have it do $3 - $2 +1 for the first file and $7 - $6 + 1 for the second file, printing each sum out in a new column.

ADD REPLYlink written 5 weeks ago by RamRS20k

For calculating the total base pairs in each file,shouldn't I calculate the difference in the individual files ? or the intersect result ?I did both ways and the results are different.

ADD REPLYlink written 4 weeks ago by Ron910

If you wish to calculate total base pairs in each file, declare 2 variables in the BEGIN block and add the $3 - $2 +1 to one variable and $7 - $6 + 1 to the other variable in each line, then print them out in the END block (or however you wish to output them)

ADD REPLYlink written 4 weeks ago by RamRS20k

A follow up question - I also want to count the percentage of UTR's ,exons in my bed file ,so the way would be downloading a bed file for both of them separately(UTR's ,exons) and doing the intersect with the complete bed file of interest?

ADD REPLYlink written 4 weeks ago by Ron910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 800 users visited in the last hour