Hi,
I have data from forty-five individuals sampled before and after treatment (paired samples) and would like to identify differentially edited sites between these conditions.
I intend to use a framework similar to what is used for finding differentially methylated sites and ASE (specifically edgeR)/
My input count table looks like this,
                                   ref1 edit1 ref2 edit2 ref3 edit3 ref4 edit4 ref5 edit5 ref6 edit6
                Coordinate_1_A_G   10   90   11   54   19    65    16    2    18    0     12    2
                Coordinate_2_T_C   20   91   65   94   55    79    62   602   58    224   64  575
                Coordinate_3_T_C   16   65   18   77   15    82    16    5    18    7     17    6
                Coordinate_4_A_G   16   15    3   15    5    13     1    6     8    0      9    1
Here ref1 = the number of unedited bases and edit1 = number of edited bases for the respective coordinate for patient1, and so on.
I would like to know the best way to model this.
Any thoughts??
I was going through this paper (https://rnajournal.cshlp.org/content/24/11/1481.short) and they use the below design:
design <- model. Matrix(~0 + patient_id + treatment: allele)
to identify sites with condition-specific changes in the edited base counts, considering the unedited base counts for each sample
I don't understand the nuances of the design matrix, but could you help me understand how this would differ from the design you have provided?
Many thanks for your guidance
The paper that you cite is counting sequence reads, not bases. I assume that is what you want to do also, although I find your references to "base counts" confusing.
The design matrix that I outlined is designed to test for differences in the proportion of edited vs unedited reads. The design matrix that you quote from the paper is designed to find differences in the abundance of edited and unedited reads separately. They are quite different analyses.
The reason I used the term "base counts" is because the counts in my input matrix are the number of edited bases (edits: A -> G, T->C, G->C ..) of the total number of bases aligned to that one single site (coordinate), and ref is the number of unedited bases of the total aligned bases for each site.