Question

Tool:Modernized RNA-MuTect pipeline for tumor-only RNA-seq somatic variant calling

1

Entering edit mode

17 hours ago

darklings ▴ 590

Hello,

I have modified and am sharing a script and processing steps that implements the workflow of the original RNA-MuTect pipeline (https://github.com/broadinstitute/RNA_MUTECT_1.0-1/tree/master) using a modern bioinformatics toolchain.

The original RNA-MuTect pipeline is a validated method for identifying somatic variants in RNA-Seq data. However, its reliance on GATK3, MuTect1, and the hg19 reference makes it difficult to implement in today's analysis environments.

This script automates the full pipeline, from raw RNA BAM to a final, re-aligned VCF, using GATK4 and HISAT2. It is designed to be a reproducible and user-friendly starting point for RNA-based somatic variant discovery.

GitHub Repository: https://github.com/seq2c/modern-rna-mutect

Key points

Modes: tumor-only or matched-normal
Parallelized SplitNCigarReads, Mutect2, and Funcotator across contigs.
Faithful logic: extract site-overlapping reads -> HISAT2 re-align -> Mutect2 re-call on intervals.

The output of the script is a VCF file and its associated stats file, which can be used as input for the further filtering steps outlined in the original RNA-MuTect paper. Rewriting the old matlab code for filtering is listed in the to-do list and will be shared once completed. Further improvements, such as supporting BAMs from different aligners such as minimap2, more adaptable to any reference genomes and callers, are also planned.

Feedback, feature suggestions, and bug reports are welcome via the GitHub repository's issue tracker. I hope this kind of summary proves useful to the community!

somatic rna-seq • 350 views

ADD COMMENT • link 17 hours ago by darklings ▴ 590