Tool: Bedtools: Analyzing Genomic Features
20
gravatar for Istvan Albert
7.2 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

All practicing bioinformaticians will face problems that require them to compare, query and select genomic features across an entire genome. As it happens efficient interval representation and query is a surprisingly challenging problem that needs a specialized representation.

The BEDTools suite contains a set of programs that support a broad range of interval analyses that involve selecting certain locations in the genome. The name reflects the original intent to process BED files but the tools operate just as well on GFF formats. The scripts need to be run in command line format and are available for UNIX type systems: Linux, Mac OSX, and Cygwin (on Windows).

The link to the site is: http://code.google.com/p/bedtools/

With BEDTools one can answer questions such as:

  • how many reads map upstream/downstream of one or more locations in the genome?
  • how many reads cover a certain base in the genome?
  • which sections of the genome are not overlapping with target intervals?
  • what are the sequences specified by the coordinates?
  • ...

The suite consists of multiple tools but for beginners the most important is intersectBed. Understanding this tool is a gateway to understanding them all. In fact many (but not all) of the other tools slopBed, windowBed are simply convenience tools that assist users preparing/formatting output a certain way and could be replaced by small custom scripts.

Note: a very large number of problems can be solved via running nothing more than the various scripts in BEDTools and occasional reformatting of the outputs. If you are new to the field take your time and learn what BEDTools does.

bedtools tool • 7.3k views
ADD COMMENTlink modified 4.5 years ago by Aaronquinlan11k • written 7.2 years ago by Istvan Albert ♦♦ 80k
10
gravatar for Aaronquinlan
6.5 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

Just an FYI to those not on the bedtools mailing list. We are close to completing a new documentation site that is already more up to date than the existing PDF. Comments and suggestions welcome as always.

bedtools.readthedocs.org/en/latest/

In particular, the genomecov, map, and cluster utilities have (finally) been properly documented.

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by Aaronquinlan11k

Can you post the bedtools recipe for annotation of intervals by features such as TSS CDS Exons 5' UTR Exons 3' UTR Exons CpG Islands Repeats Introns Intergenic

ADD REPLYlink written 5.6 years ago by Jeremy Leipzig18k
4
gravatar for enricoferrero
7.0 years ago by
enricoferrero770
United Kingdom
enricoferrero770 wrote:

Just came here to add something which is not in the documentation but I need to do quite often.

How to join/merge 2 or more BED files with BEDtools:

cat file1.bed file2.bed [fileN.bed] | sortBed -i stdin | mergeBed -i stdin > merged.bed
ADD COMMENTlink written 7.0 years ago by enricoferrero770
4

You might check out BEDOPS:

bedops -m file1.bed file2.bed ... fileN.bed > merged.bed

assuming your input files are sorted, then the output will be too (useful for further downstream analyses). Gets you out of cat'ing everything together and doing a sort on a larger file. The bedops program is designed from the ground up to work efficiently with any number of input files at once.

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by sjneph600
4
gravatar for Aaronquinlan
5.5 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

We just posted an assessment of bedtools' performance with sorted and unsorted data as a function of dataset size:

http://bedtools.readthedocs.org/en/latest/#performance

ADD COMMENTlink written 5.5 years ago by Aaronquinlan11k
3
gravatar for Aaronquinlan
4.5 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

Bedtools version 2.22.1 is out.  Details below. Importantly, the `closest` tool is 30-80X faster (depending on options) now that it requires sorted input datasets. The `closest` tool also search for closest features among any number of "B" files. In addition, we have finally written proper docs for the `closest` tool.  In the works for the next release are options to find the k-closest features and options to force the discovery of the closest feature both upstream and downstream.

https://github.com/arq5x/bedtools2/releases/tag/v2.22.1

  • When using -sorted with intersectmap, and closest, bedtools can now detect and warn you when your input datasets employ different chromosome sorting orders.

  • Fixed multiple bugs in the new, faster closest tool. Specifically, the -iu, -id, and -D options were not behaving properly with the new "sweeping" algorithm that was implemented for the 2.22.0 release. Many thanks to Sol Katzman for reporting these issues and for providing a detailed analysis and example files.

  • We FINALLY wrote proper documentation for the closest tool.
    http://bedtools.readthedocs.org/en/latest/content/tools/closest.html

  • Fixed bug in the tag tool when using -intervals, -names, or -scores. Thanks to Yarden Katz for reporting this.

  • Fixed issues with chromosome boundaries in the slop tool when using negative distances. Thanks to @acdaugherty!

  • Multiple improvements to the fisher tool. Added a -m option to the fisher tool to merge overlapping intervals prior to comparing overlaps between two input files. Thanks to@brentp

  • Fixed a bug in makewindows tool requiring the use of -b with -s.

  • Fixed a bug in intersect that prevented -split from detecting complete overlaps with -f 1. Thanks to @tleonardi .

  • Restored the default decimal precision to the groupby tool.

  • Added the -prec option to the merge and map tools to specific the decimal precision of the output.

ADD COMMENTlink written 4.5 years ago by Aaronquinlan11k
2
gravatar for Aaronquinlan
5.7 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

I added a brief tutorial to introduce beginners to bedtools. http://quinlanlab.org/tutorials/cshl2013/bedtools.html

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Aaronquinlan11k

bedtools complement could have feature to extract introns from exons.bed - intervals in-between bed lines (at the moment I am using this http://stackoverflow.com/q/17167602/1286528 solution).

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by PoGibas4.8k
2
gravatar for Aaronquinlan
5.4 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

We just released version 2.19.1. This fixes a silly bug in 'intersect', and allows one to apply multiple operations/columns with the map tool in a single run.

$ bedtools map -a a.bed -b b.bed -c 5,5,5,5 -o min,max,median,collapse

Or:

$ bedtools map -a a.bed -b b.bed -c 3,4,5,6 -o mean

We have also refactored the code for computing operations on the overlapping columns and ths has resulted in a speedup over previous releases and other methods.

Commands used for plot below:

  runit bedtools-2.18.0 map -a ccds.exons.bed -b sample.10M.bam.bed -c 1 -o count > /dev/null
  runit bedtools-2.19.0 map -a ccds.exons.bed -b sample.10M.bam.bed -c 1 -o count > /dev/null
  runit bedtools-2.19.1 map -a ccds.exons.bed -b sample.10M.bam.bed -c 1 -o count > /dev/null
  runit bedmap --count --echo --bp-ovr 1 ccds.exons.bed sample.10M.bam.bedmap.bed > /dev/null

  # not shown (time = 21.15 seconds)
  runit bedmap --count --ec --bp-ovr 1 ccds.exons.bed sample.10M.bam.bedmap.bed > /dev/null

Speed comparison

ADD COMMENTlink modified 5.4 years ago by SES8.2k • written 5.4 years ago by Aaronquinlan11k
2
gravatar for Aaronquinlan
4.8 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

We just released verion 2.21.0.

There are three highlights. First, the `intersect` tool can now intersect more than two files. An example of this can be found here. Secondly, the intersect tool should be up to 2 times faster when using sorted data for certain use cases owing to an enhancement to the core algorithm.  Third, Brent Pedersen has contributed the "fisher" tool which conducts a Fisher's exact test assessing the significance of the overlaps between two interval files.

Release details: http://bedtools.readthedocs.org/en/latest/content/history.html

ADD COMMENTlink written 4.8 years ago by Aaronquinlan11k
1
gravatar for Aaronquinlan
5.6 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

We just released version 2.18.0 which is much faster for sorted data, includes new tools and features, and allows greater flexibility with chromosome naming and sorting. Details here.

Importantly, Google Code is being shut down by Google. As such, all releases and code will be maintained on Github. The repository is here:

https://github.com/arq5x/bedtools2

Thanks for your patience and for the continued use of bedtools.

ADD COMMENTlink modified 5.5 years ago • written 5.6 years ago by Aaronquinlan11k
1
gravatar for Aaronquinlan
5.4 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

We just released version 2.19.0, which addresses a couple important bugs, reduces memory, and confers 3X speedup to the map tool. In addition, the map tool supports the -split option as well as alternative chromosome ordering schemes (i.e., beside lexicographic).

Details: https://groups.google.com/forum/#!topic/bedtools-discuss/UJpo5JJO38M Releases: https://github.com/arq5x/bedtools2/releases

ADD COMMENTlink written 5.4 years ago by Aaronquinlan11k
1
gravatar for Aaronquinlan
5.1 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

We just released verion 2.20.0 and 2.20.1. Release details: http://bedtools.readthedocs.org/en/latest/content/history.html

Download: https://github.com/arq5x/bedtools2/releases/tag/v2.20.1

ADD COMMENTlink written 5.1 years ago by Aaronquinlan11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1905 users visited in the last hour