Question: How to count the reads that overlap with 3' splice sites or 5' splice sites?
0
gravatar for vaslanzadeh
3.5 years ago by
vaslanzadeh0 wrote:

Hello

I used HTSeq-Count for counting the reads that overlap with introns and exon. I also need to count the reads that overlap with 5' or 3' splice sites not junctions. I want to count the reads that overlap 5 base pairs with intron and 5 base pairs with exon in 5' or 3' splice site. For example, if start site of my intron (5'ss) is nucleotide 20 on chromosome X, I want to count the reads that overlap from nucleotide 15 up to 24 (10 nucelotides length). I created a GTF file that have these coordinates X 16 24 5'SS, and I think what I should do is asking HTSeq-count to count 5'SS reads with minimum overlap of 10 nucleotides. However, HTSeq-count does not have option to set minimum overlapping bases. I started to use featureCount but it gives me unexpected results. When I set minimum overlap bases in featurecount to 11 or 12, that is bigger than 10 (length of 5'ss), it still reports overlapping reads. Is there a way to count the reads that overlap with few base pairs upstream and downstream of splice sites?

rna-seq alignment • 1.6k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by vaslanzadeh0
1

This is probably one of those cases where bedtools intersect is more useful. I think it has a convenient option for minimum amount of overlap.

ADD REPLYlink written 3.5 years ago by Devon Ryan92k

Thank you Devon, bedtools intersect -wo returns the number of overlapping bases between reads and features. My data is paired end and featureCount/HTSeq-Count has option to deal with paired-end reads and count read pairs only once for a given feature. However, bedtools intersect -wo will count paired end reads twice for 5'ss and 3'ss. In this case I can not compare read coverage that I did get for exons, introns and junctions (from featureCount) with 5'ss and 3'ss.. I am studying splicing and I need to know read coverage over 5'ss and 3'ss for some of my analysis.

ADD REPLYlink written 3.5 years ago by vaslanzadeh0

You might have to write something to meet your exact needs. A combination of pysam and pybedtools might make things a bit simpler.

ADD REPLYlink written 3.5 years ago by Devon Ryan92k

Thanks Devon,

I found what I was doing wrong. I should set featureCount to deal with my paired-end data as single-end data. In this case, featureCount finds single reads that have 10bp overlap with my coordinates.

This might be useful for those who might have the same question.

ADD REPLYlink written 3.5 years ago by vaslanzadeh0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1314 users visited in the last hour