Discrepancy in RNA editing calls between STAR and Subjunc aligners using JACUSA2
1
0
Entering edit mode
3 months ago

Dear community

I'm looking for your advice on a somewhat puzzling issue in my RNA editing analysis.

Goal: To estimate baseline RNA editing events (e.g., A-to-I) in a mouse tissue using bulk RNA-seq data.

Main Tools used: Rsubread (v2.18.0) - Subjunc for alignment STAR (v2.7.11a) - Alternative aligner for comparison JACUSA2 (v2.0.4) - RNA editing caller IGV - July 7, 2025 snapshot, includes dark mode

What I tried: I first aligned my reads using subjunc() from Rsubread and then called editing events with JACUSA2. I examined MAPQ distributions using samtools, and compared results with alignments from STAR (after trimming with fqtrim). In R

subjunc(index = "/path/to/index/GRCm39_noAlt",  
      nthreads = 18, 
      readfile1 = fastq_files_R1_abs,
      readfile2 = fastq_files_R2_abs,
      sortReadsByCoordinates = TRUE, 
      reportAllJunctions = F,
      isGTF = T,
      useAnnotation = T, 
      annot.ext = "/path/to/annotation/gencode.vM36.annotation.gtf")

and get following MAPQ distribution using CLI:

samtools view file.BAM | awk '{print $5}' | sort -n | uniq -c | column -t
23919594  0
578935    6
1842340   8
3995604   10
8045340   13
24852224  20
62500449  40

For STAR, i do trimming prior alignment:

fqtrim -l 50 -q 20 -t 9 -B -p 10 -P 33 -o trim.fq.gz --outdir /path/to/R1.fastq.gz,/path/to/R2.fastq.gz

STAR \
  --runThreadN 2 \
  --genomeDir /path/to/index/star_149overhang \
  --readFilesIn /path/to/R1.trim.fq.gz /path/to/R2.trim.fq.gz \
  --readFilesCommand zcat \
  --outSAMtype BAM SortedByCoordinate \
  --outFileNamePrefix ./bam_star/ \
  --outTmpDir /home/chuddy/tmp_star_run

samtools view file_star.bam | awk '{print $5}' | sort -n | uniq -c | column -t
3138872   0
5052717   1
6602086   3
59416692  255

Finally, I used jacusa2, to get a bed file with base edits

JACUSA2 call-1 /path/to/file.bam \
  -R /path/to/reference/GRCm39.noAlt.fa \
  -r jacusa_output.bed \
  -c 10 \
  -p 14 \
  -P RF-FIRSTSTRAND \
  -a I,B,S,Y

thereafter, I only use some scripting in R to process and filter and clean up data to get a csv with reference base being Adenine and the edits and which ones overlap with gene bodies.

I originally used subjunc() for alignment and even identified a known human substitution in mouse as a positive control. However, when examining our genes of interest, I noticed that most RNA edits had the flag "B", indicating they occurred near read ends. In many cases, the "edit" was the second-to-last base before soft clipping (e.g., 129M3477N8M14S). When no mismatch was present, the read aligned further. This looked suspicious, so I compared it with STAR. Here's an example:

enter image description here

Using STAR gives me completely different results and likely a different biological interpretation. The "edits" near the end of reads that appear with subjunc() disappear with STAR, even though the base quality (QV) is 40 in both cases. Unfortunately, I also realized can’t harmonize the MAPQ scores between the two aligners, but it might not be necessary, since STAR won't show any of those edits.

Is there anything I can reliably trust here? It seems that the choice of aligner leads to such different biological conclusions.

Cheers

jacusa2 rnaseq subjunc star rsubread • 612 views
ADD COMMENT
0
Entering edit mode
3 months ago
Mark ★ 1.7k

Yes the choice of aligner will heavily influence the downstream analysis. From memory Subread is one of the worst performing aligner.

See this paper:

https://link.springer.com/article/10.1007/s00521-021-06188-z

I recommend trying a third aligner, such as HISAT2 and comparing.

Also why do you trim for STAR but not for subread? You need to preprocess the data exactly the same way for the downstream analysis to be comparable. But, read trimming might not even be required as most mappers and rnaseq tools factor in quality https://pubmed.ncbi.nlm.nih.gov/33575617/

ADD COMMENT
0
Entering edit mode

Thanks for your comment. Yes, for editing evaluation subread is bad i saw. For gene-level gene expression evaluation it seems still valid when i compared results between mine and other workflows. subjunc was rated better compared to subread when it comes to editing events.

I decided to use a 2pass star workflow and that's it for now.

why i did not trim for subread but for star? i was not sure how star deals with trimmed, vs non-trimmed reads. I just knew that subread won't need any extra trimming.

i hope for some good workflow in the future without the need to compare multiple aligners and take the overlap of them, since the worst aligner might destroy a fine result.

ADD REPLY

Login before adding your answer.

Traffic: 3893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6