Entering edit mode
6.3 years ago
Ron
★
1.2k
Hello all,
Is there any way to count the number of base pairs in each individual bed file?
I know we can do this for the intersecting bed files,but want to to do this separately .
bedtools intersectBed -a file1.bed -b file2.bed -wo
below is the output from the two bed files using intersect and counting number of base pairs from that. However i also want to count the number of total base pairs in each of those bed files too.(per interval and then can sum those up)
chr1 69028 69391 ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547 chr1 69090 70008 301 chr1 69432 69630 ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547 chr1 69090 70008 198 chr1 69677 69961 ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547 chr1 69090 70008 284 chr1 621055 622013 ref|OR4F3,ref|OR4F29,ref|OR4F16,ref|NM_001005221,ref|NM_001005224,ref|NM_001005277,ens|ENST00000440200,ens|ENST00000332831,ccds|CCDS41221 chr1 621095 622034 918 chr1 861071 861574 ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000437963,ccds|CCDS2 chr1 861321 861393 72 chr1 865582 865885 ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000341065,ens|ENST00000437963,ccds|CCDS2 chr1 865534 865716 134 chr1 866331 866507 ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000341065,ens|ENST00000437963,ccds|CCDS2 chr1 866418 866469 51 chr1 871064 871262 ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000341065,ccds|CCDS2 chr1 871151 871276 111 chr1 874294 874969 ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000455979,ens|ENST00000341065,ccds|CCDS2 chr1 874419 874509 90 chr1 874294 874969 ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000455979,ens|ENST00000341065,ccds|CCDS2 chr1 874654 874840 186
Thanks
Ron
Can you give us a few sample lines of input and expected output? I don't understand what you mean by counting base pairs and features in a file that contains just contig names and coordinates.
Hi RamRS, I updated my question. I just want to look at the number of base pairs in each bed files.(not features, i Updated it)
It's a tab separated file, you can simply pass the intersect result through awk and have it do
$3 - $2 +1
for the first file and$7 - $6 + 1
for the second file, printing each sum out in a new column.For calculating the total base pairs in each file,shouldn't I calculate the difference in the individual files ? or the intersect result ?I did both ways and the results are different.
If you wish to calculate total base pairs in each file, declare 2 variables in the
BEGIN
block and add the$3 - $2 +1
to one variable and$7 - $6 + 1
to the other variable in each line, then print them out in theEND
block (or however you wish to output them)A follow up question - I also want to count the percentage of UTR's ,exons in my bed file ,so the way would be downloading a bed file for both of them separately(UTR's ,exons) and doing the intersect with the complete bed file of interest?