Question

Differential RNA editing using edgeR

0

Entering edit mode

21 months ago

Aiswarya ▴ 20

Hi,

I have data from forty-five individuals sampled before and after treatment (paired samples) and would like to identify differentially edited sites between these conditions.

I intend to use a framework similar to what is used for finding differentially methylated sites and ASE (specifically edgeR)/

My input count table looks like this,

                                   ref1 edit1 ref2 edit2 ref3 edit3 ref4 edit4 ref5 edit5 ref6 edit6
                Coordinate_1_A_G   10   90   11   54   19    65    16    2    18    0     12    2
                Coordinate_2_T_C   20   91   65   94   55    79    62   602   58    224   64  575
                Coordinate_3_T_C   16   65   18   77   15    82    16    5    18    7     17    6
                Coordinate_4_A_G   16   15    3   15    5    13     1    6     8    0      9    1

Here ref1 = the number of unedited bases and edit1 = number of edited bases for the respective coordinate for patient1, and so on.

I would like to know the best way to model this.

Any thoughts??

editing RNA GLM • 1.5k views

ADD COMMENT • link 21 months ago by Aiswarya ▴ 20

score 2 · Answer 1 · 2024-01-11

2

Entering edit mode

21 months ago

Gordon Smyth ★ 8.5k

The edgeR methylation-style-analysis can be applied whenever the sequence reads can be classified into two categories for each locus and sample. We have used the approach with success for example to determine haplotype-specific differential expression, where the two columns correspond to read counts from two haplotypes (from heterozygous mice) at each locus.

In your case, you say that you are counting bases rather than reads and I don't know exactly how you would be doing that. Base counting is statistically different from read counting because adjacent bases are not statistically independent. I am not sure how well the edgeR methylation-style-analysis will work on base counts, but it might be ok provided that you use quasi-F tests in edgeR rather than likelihood ratio tests, because base counting will surely show technical overdispersion.

You can try proceeding as for an edgeR RRBS methylation analysis, with the edit column playing the role of methylated reads and the ref column playing the role of unmethylated reads. I assume you've already seen the methylation workflow:

https://bioinf.wehi.edu.au/edgeR/F1000Research2017/Chen_MethylationV2-Reprint23Oct18.pdf

To make the design matrix, you would use

design <- modelMatrixMeth(~ 0 + Patient + Time)

where Time is the before/after variable. Patient and Time are factors with 90 entries (the number of samples) and the resulting design matrix will have 180 rows to account for the edited and non-edited entries for each sample.

When you conduct tests, I suggest using glmQLFit and glmQLFTest instead of glmFit and glmLRT.

ADD COMMENT • link 21 months ago by Gordon Smyth ★ 8.5k

1

Entering edit mode

I was going through this paper (https://rnajournal.cshlp.org/content/24/11/1481.short) and they use the below design:

design <- model. Matrix(~0 + patient_id + treatment: allele)

to identify sites with condition-specific changes in the edited base counts, considering the unedited base counts for each sample

I don't understand the nuances of the design matrix, but could you help me understand how this would differ from the design you have provided?

Many thanks for your guidance

ADD REPLY • link 21 months ago by Aiswarya ▴ 20

0

Entering edit mode

The paper that you cite is counting sequence reads, not bases. I assume that is what you want to do also, although I find your references to "base counts" confusing.

The design matrix that I outlined is designed to test for differences in the proportion of edited vs unedited reads. The design matrix that you quote from the paper is designed to find differences in the abundance of edited and unedited reads separately. They are quite different analyses.

ADD REPLY • link 21 months ago by Gordon Smyth ★ 8.5k

0

Entering edit mode

The reason I used the term "base counts" is because the counts in my input matrix are the number of edited bases (edits: A -> G, T->C, G->C ..) of the total number of bases aligned to that one single site (coordinate), and ref is the number of unedited bases of the total aligned bases for each site.

ADD REPLY • link 21 months ago by Aiswarya ▴ 20