Question

Tool:pairedBamToBed12: converts paired-end BAM to BED12 format, based on bedtools

4

Entering edit mode

8.6 years ago

Charles Plessy ★ 2.9k

I would like to introduce pairedBamToBed12, a tool that we created at RIKEN some time ago to represent our paired-end transcriptome data with one single line per pair.

It is based on bedtools; technically, it is a fork with the command pairedBamToBed12 added and all the other commands removed (which means that it might be merged if there would be enough interest). Our source code is on GitHub.

There is a bamtobed tool in bedtools, but it was not fitting our needs, because bamtobed -split leaves the forward (Read 1) and reverse (Read 2) reads on separate BED12 lines, and the BEDPE format output by bamtobed -bedpe does not support spliced alignments.

As a brief illustration of what it does:

Read 1:   >>>>>>>>>>>>
Read 2:                     <<<<<<<<<<<<<-----<<<<<<<
The pair: >>>>>>>>>>>>------>>>>>>>>>>>>>----->>>>>>>

Perhaps the best way to see further what the program does is to look at our regression tests. Our main use for it is to represent our paired-end CAGE data (CAGEscan), before upload to our home-made genome browser, Zenbu, that can represent BED12 files either as conventional intervals, or as quantitative coverage plots of the whole area or the 5' or 3' end (the 5' being particularly relevant for CAGE).

The main limitation of our approach is that it is strongly tied to proper pairing, in particular it can not represent transcripts overlaping multiple chromosomes, as in the case of recombinations, viral insertions, trans-splicing etc. This said, it is not a big problem for projects that are not requiring exploration of de novo transcript patterns. We are currently considering to support the optional use of one read only in case of non-proper pairing, as a compromise workaround.

pairedBamToBed12 is Free software (GPL-2 like bedtools), and I would be excited if it had more users and developers, which is why am writing this post :)

This said, if there is a superior solution, either already implemented or not, I will be very interested to discuss it. In particular, I wonder if in the long term, in order to support recombinations not represented in the reference genome used for alignment, it would be needed to give up on the simplicity of having one pair per line, and switch to a different format such as GFF...

-- Charles Plessy, Tsurumi, Kanagawa, Japan (working at RIKEN, see population-transcriptomics.org).

CAGE paired-end • 3.6k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

I just released version 1.1, that adds a new option to match read names that differ after a given separator (for instance if Read1 and Read2 got differetnt flags added to the name field in the FASTQ files during quality controls or other processing steps).

ADD REPLY • link 7.8 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

I just released version 1.2, that adds a new experimental option to correct for "G addition". Comments are welcome, especially on better ways to solve the problem.

ADD REPLY • link 7.7 years ago by Charles Plessy ★ 2.9k

score 0 · Answer 1 · 2016-11-17

0

Entering edit mode

7.4 years ago

Carlo Yague 8.7k

Great ! I have been looking for a tool that reconcile bedtools -bepe and -split ! I even asked this forum a month before this post and tried tested various methods. I guess one is never googling enough. :)

I'll try this and let you know how it goes.

ADD COMMENT • link 7.4 years ago by Carlo Yague 8.7k

0

Entering edit mode

Thanks ! I am definitely intersted by your feedback. If you like the tool, you can also support our proposal to add it to bedtools (on which the source code is based).

ADD REPLY • link 7.4 years ago by Charles Plessy ★ 2.9k