How to split genome coverage by read length in Ribo-Seq data
2
1
Entering edit mode
8.1 years ago
tonu.margus ▴ 20

I am looking for an efficient/simple way for extracting genome coverage for each strand(+/- ) in a given range (for example 25-34) of read length. My input is sorted BAM file (Ribo-Seq data) and as a measure of coverage I would like to get 5' positions of each read.

I found bedtools genomecov solves half of problem

bedtools genomecov -d -5 -ibam my_sorted.bam -strand + -g genome.fa > output_plus.txt

but I didn't found an option to do it on a base of read length. Is there some simple solution what can be apply on command line?

RNA-Seq • 2.7k views
ADD COMMENT
1
Entering edit mode
8.1 years ago
EVR ▴ 610

Hi,

As the RPF length will be around 27-31, I would advice you to split the the sam file into sub-sam files based on length(27-31) and later apply the genome coverage function.

ADD COMMENT
0
Entering edit mode
8.1 years ago
EVR ▴ 610

Hi,

splitting the sam/bam files can be done line following;

`awk': samtools view sam/bam_file | awk '{if (length($10)==l) print $0)}' > l.bam

where

l is 27, 28,29,30,31

. I am not 100% sure of the code but the awk script should be similar like this.

ADD COMMENT
1
Entering edit mode

Hi Tom,

Splitting BAM file was one possible way to go. I had to change it a bit to get output in compressed BAM. Here is an example what worked for me to generate 27 nt bam.

 samtools view -h my.bam | awk 'BEGIN {FS="\t"}; ((/^@/) || (length($10) == 27)) {print $0}' | samtools view -bS - > my_27.bam

I was wondering is there any options to do the same without creating additional intermediate files.

ADD REPLY

Login before adding your answer.

Traffic: 2440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6