Question

How to split genome coverage by read length in Ribo-Seq data

1

Entering edit mode

8.1 years ago

tonu.margus ▴ 20

I am looking for an efficient/simple way for extracting genome coverage for each strand(+/- ) in a given range (for example 25-34) of read length. My input is sorted BAM file (Ribo-Seq data) and as a measure of coverage I would like to get 5' positions of each read.

I found bedtools genomecov solves half of problem

bedtools genomecov -d -5 -ibam my_sorted.bam -strand + -g genome.fa > output_plus.txt

but I didn't found an option to do it on a base of read length. Is there some simple solution what can be apply on command line?

RNA-Seq • 2.7k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 8.1 years ago by tonu.margus ▴ 20

score 1 · Answer 1 · 2016-03-24

1

Entering edit mode

8.1 years ago

EVR ▴ 610

Hi,

As the RPF length will be around 27-31, I would advice you to split the the sam file into sub-sam files based on length(27-31) and later apply the genome coverage function.

ADD COMMENT • link 8.1 years ago by EVR ▴ 610

score 0 · Answer 2 · 2016-03-24

0

Entering edit mode

8.1 years ago

EVR ▴ 610

Hi,

splitting the sam/bam files can be done line following;

`awk': samtools view sam/bam_file | awk '{if (length($10)==l) print $0)}' > l.bam

where

l is 27, 28,29,30,31

. I am not 100% sure of the code but the awk script should be similar like this.

ADD COMMENT • link 8.1 years ago by EVR ▴ 610

1

Entering edit mode

Hi Tom,

Splitting BAM file was one possible way to go. I had to change it a bit to get output in compressed BAM. Here is an example what worked for me to generate 27 nt bam.

 samtools view -h my.bam | awk 'BEGIN {FS="\t"}; ((/^@/) || (length($10) == 27)) {print $0}' | samtools view -bS - > my_27.bam

I was wondering is there any options to do the same without creating additional intermediate files.

ADD REPLY • link 8.1 years ago by tonu.margus ▴ 20