Dear community
I'm looking for your advice on a somewhat puzzling issue in my RNA editing analysis.
Goal: To estimate baseline RNA editing events (e.g., A-to-I) in a mouse tissue using bulk RNA-seq data.
Main Tools used: Rsubread (v2.18.0) - Subjunc for alignment STAR (v2.7.11a) - Alternative aligner for comparison JACUSA2 (v2.0.4) - RNA editing caller IGV - July 7, 2025 snapshot, includes dark mode
What I tried: I first aligned my reads using subjunc() from Rsubread and then called editing events with JACUSA2. I examined MAPQ distributions using samtools, and compared results with alignments from STAR (after trimming with fqtrim). In R
subjunc(index = "/path/to/index/GRCm39_noAlt",
nthreads = 18,
readfile1 = fastq_files_R1_abs,
readfile2 = fastq_files_R2_abs,
sortReadsByCoordinates = TRUE,
reportAllJunctions = F,
isGTF = T,
useAnnotation = T,
annot.ext = "/path/to/annotation/gencode.vM36.annotation.gtf")
and get following MAPQ distribution using CLI:
samtools view file.BAM | awk '{print $5}' | sort -n | uniq -c | column -t
23919594 0
578935 6
1842340 8
3995604 10
8045340 13
24852224 20
62500449 40
For STAR, i do trimming prior alignment:
fqtrim -l 50 -q 20 -t 9 -B -p 10 -P 33 -o trim.fq.gz --outdir /path/to/R1.fastq.gz,/path/to/R2.fastq.gz
STAR \
--runThreadN 2 \
--genomeDir /path/to/index/star_149overhang \
--readFilesIn /path/to/R1.trim.fq.gz /path/to/R2.trim.fq.gz \
--readFilesCommand zcat \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix ./bam_star/ \
--outTmpDir /home/chuddy/tmp_star_run
samtools view file_star.bam | awk '{print $5}' | sort -n | uniq -c | column -t
3138872 0
5052717 1
6602086 3
59416692 255
Finally, I used jacusa2, to get a bed file with base edits
JACUSA2 call-1 /path/to/file.bam \
-R /path/to/reference/GRCm39.noAlt.fa \
-r jacusa_output.bed \
-c 10 \
-p 14 \
-P RF-FIRSTSTRAND \
-a I,B,S,Y
thereafter, I only use some scripting in R to process and filter and clean up data to get a csv with reference base being Adenine and the edits and which ones overlap with gene bodies.
I originally used subjunc() for alignment and even identified a known human substitution in mouse as a positive control. However, when examining our genes of interest, I noticed that most RNA edits had the flag "B", indicating they occurred near read ends. In many cases, the "edit" was the second-to-last base before soft clipping (e.g., 129M3477N8M14S). When no mismatch was present, the read aligned further. This looked suspicious, so I compared it with STAR. Here's an example:
Using STAR gives me completely different results and likely a different biological interpretation. The "edits" near the end of reads that appear with subjunc() disappear with STAR, even though the base quality (QV) is 40 in both cases. Unfortunately, I also realized can’t harmonize the MAPQ scores between the two aligners, but it might not be necessary, since STAR won't show any of those edits.
Is there anything I can reliably trust here? It seems that the choice of aligner leads to such different biological conclusions.
Cheers
Thanks for your comment. Yes, for editing evaluation subread is bad i saw. For gene-level gene expression evaluation it seems still valid when i compared results between mine and other workflows. subjunc was rated better compared to subread when it comes to editing events.
I decided to use a 2pass star workflow and that's it for now.
why i did not trim for subread but for star? i was not sure how star deals with trimmed, vs non-trimmed reads. I just knew that subread won't need any extra trimming.
i hope for some good workflow in the future without the need to compare multiple aligners and take the overlap of them, since the worst aligner might destroy a fine result.