I am trying to find A to I editing in my samples. I used a pipeline that uses GATK haplotype caller to predict variants and filters out indels and snps from the output VCF to get just ( mostly ) A to I edits.
Now I want to find the editing levels for each site that was called an edit by the pipeline, but I am confused if I can do that from this data alone ( because these are like de novo sites - I.e is not confirmed in any of the editing databases) or should I compare these sites with validated sites from REDI portal and then find editing levels for those sites. ( the reason for this confusion is that I read in a paper "To quantify the global RNA editing in a sample, one can average the editing levels measured over the sites detected previously, or by de novo methods. This metric, referred to as the overall editing, is determined as the total number o reads with G at all known editing positions over the number of all reads covering the positions without imposing specific sequencing coverage criteria. The overall editing RNA Editing Quantification depends on the number of known editing sites included in the analysis that have to be the same for all samples analyzed. Using de novo editing events for this purpose is not recommended, as the number of detected sites is unevenly distributed across samples and strongly depends on the amount of raw reads input and the bioinformatics procedure. Even merging de novo candidates from all samples of interest does not remove the coverage bias altogether" (Lo Giudice et al.)
I was thinking I will merge my sites as described in the paper - and I believe my samples have more or less the same coverage ( if I understand coverage correctly)
Any help would be much appreciated,
Thanks in advance