Calculate Exon Inclusion From Bed Files
1
1
Entering edit mode
9.4 years ago
skm770 ▴ 150

Hi There are many datasets for which raw data is not publicly available but BED files are available. Is there any way we can study/calculate exon inculsion levels from these BED files. Splicetrap is one tool which can calculate this from raw files I want to know if something like that can also be done using the analyzed bed files.

Appreciate any help regarding the same.

Thanks

exon • 2.6k views
0
Entering edit mode

I could be wrong, but did you mean 'BAM' instead of 'BED' file?

0
Entering edit mode

No I meant 'Bed' not bam it would not be a problem if it was a bam file because we have tools that can give us that from bam files.

0
Entering edit mode

Could you please provide a few lines from such a file?

0
Entering edit mode

This is how the first few lines look like :-

chr1 10000 10106 1 566 + 2.87 5472.0 4560.0

chr1 11791 11924 2 200 + 0.0636 152.0 121.0

chr1 12040 12198 3 246 + 0.107 304.0 76.0

chr1 13644 13803 4 184 + 0.0532 152.0 165.0

chr1 13882 14023 5 195 + 0.06 152.0 28.0

chr1 15392 15495 8 222 + 0.0821 152.0 266.0

chr1 16541 16668 9 266 + 0.133 304.0 853.0

chr1 16938 17087 10 251 + 0.114 304.0 1175.0

0
Entering edit mode

What do the last three columns represent?

0
Entering edit mode

They are raw read counts and normalized counts for the intervals.

0
Entering edit mode

I'm afraid this one is beyond me. The data available is not clear to me.

0
Entering edit mode

What I want to know is that if we have bed files with read counts at set intervals can we calculate exon inclusion levels or not.

3
Entering edit mode
9.4 years ago
polarise ▴ 380

From the simplest assumptions this should be possible.

I will assume the following:

1. Genomic segments in the BED file are non-overlapping.
2. The exons of interest will be completely covered by the genomic segments in the BED file (see the figure attached). i.e. there are no gaps. This may be quite difficult to guarantee for majority of exons.
3. Expression in the BED file is computed using simple RPKM (which is not the most reliable measure; but it practically useful).

Under this assumptions, we can calculate the combined RPKM of two segments 1 and 2.

R = R_1 + R_2 + K,

where R is the overall RPKM, R_1 and R_2 are respectively the segment RPKMs, and K is some value to be estimated from the read count and fragment length values for segments 1 and 2 as follows

K = R - R_1 - R_2 = 10^9/N(r_1 + r_2)/(l_1 + l_2) - 10^9/N(r_1/l_1 + r_2/l_2) = 10^9/N[(r_1 + r_2)/(l_1 + l_2) - (r_1l_2 + r_2l_1)/(l_1l_2)].

Here, r_i and l_i are read and fragment length for segment i, respectively. The expression for K can also be calculated for more than 3 segments using the same approach (I'm sure there's a mathematical hack somewhere inside...).

Please remember, these are assumptions might not generally hold. Consider this idea with caution.

(We need LaTeX here!)