Can I split a bed file into 1000bp bins and add up my read numbers?
2
0
Entering edit mode
4.4 years ago

Dear all,

I am trying to split my bed files so that I have bins of 1000bp. I am also wanting the red number in column 4 to add up when I do this. i.e. currently I have:

  • chr1 3053057 3053058 7
  • chr1 3053058 3053132 9
  • chr1 3053132 305313 16
  • chr1 3053133 3053174 17
  • chr1 3053174 3053175 10
  • chr1 3053175 3053202 8
  • chr1 3053202 3053249 9
  • chr1 3072131 3072248 1
  • chr1 3072250 3072315 1
  • chr1 3076197 3076235 1

And I would like

  • chr1 3053057 3054057 76
  • chr1 3054057 3056057 0

and so on

I was wanting to do this to two bed files and then plot them as a scatter graph. Can anyone recommend the best way to do this as well?

Thank you

RNA-Seq rna-seq genome alignment bed • 2.6k views
ADD COMMENT
3
Entering edit mode
4.4 years ago

I don't know of a direct method to accomplish what you need, but have a look at this tentative proposal:

awk 'FS=OFS="\t"{print $1, 0, $2}' human_hg19.fa.fai \
| bedtools makewindows -b - -w 1000 \
| bedtools map -a - -b input.bedgraph -c 4 -o sum \
| grep -P "\d$"

what it does is:

  1. uses your genome of reference's fai file
  2. converts it to bed format adding a 0 start point to each chromosome line (awk)
  3. creates all possible 1000bp windows for each chromosome (bedtools makewindows)
  4. maps the coverage information from your input on top of all 1000bp windows (bedtools map)
  5. outputs only the 1000bp windows with non-empty cogerage information (grep)

notes:

  • expected .fai file can be generated with samtools faidx human_hg19.fa
  • coverage information needs to be in bedgraph format. your example input is already in bedgraph format (4 columns containing chr, start, end, coverage), although you need to make sure that it is tabulated.
ADD COMMENT
0
Entering edit mode
4.4 years ago

Have a look at bedtools intersect for this, there are great docs and tutorials there.

http://quinlanlab.org/tutorials/bedtools/bedtools.html

ADD COMMENT

Login before adding your answer.

Traffic: 2691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6