Tool:Modernized RNA-MuTect pipeline for tumor-only RNA-seq somatic variant calling
0
1
Entering edit mode
17 hours ago
darklings ▴ 590

Hello,

I have modified and am sharing a script and processing steps that implements the workflow of the original RNA-MuTect pipeline (https://github.com/broadinstitute/RNA_MUTECT_1.0-1/tree/master) using a modern bioinformatics toolchain.

The original RNA-MuTect pipeline is a validated method for identifying somatic variants in RNA-Seq data. However, its reliance on GATK3, MuTect1, and the hg19 reference makes it difficult to implement in today's analysis environments.

This script automates the full pipeline, from raw RNA BAM to a final, re-aligned VCF, using GATK4 and HISAT2. It is designed to be a reproducible and user-friendly starting point for RNA-based somatic variant discovery.

GitHub Repository: https://github.com/seq2c/modern-rna-mutect

Key points

  • Modes: tumor-only or matched-normal
  • Parallelized SplitNCigarReads, Mutect2, and Funcotator across contigs.
  • Faithful logic: extract site-overlapping reads -> HISAT2 re-align -> Mutect2 re-call on intervals.

The output of the script is a VCF file and its associated stats file, which can be used as input for the further filtering steps outlined in the original RNA-MuTect paper. Rewriting the old matlab code for filtering is listed in the to-do list and will be shared once completed. Further improvements, such as supporting BAMs from different aligners such as minimap2, more adaptable to any reference genomes and callers, are also planned.

Feedback, feature suggestions, and bug reports are welcome via the GitHub repository's issue tracker. I hope this kind of summary proves useful to the community!

somatic rna-seq • 350 views
ADD COMMENT

Login before adding your answer.

Traffic: 4230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6