Hi,
I have RNA-Seq data from a "control" cell line and the same cell line in which a an oncogenic transformation was induced (3 samples of each).
I want to look for SNVs that are characteristic of the transformed cells. I have followed the GATK best practices for calling variants in RNA-Seq, which includes alignment with STAR 2-pass followed by GATK splitNtrim (splits reads into exon segments).
I now want to perform the variant calling step. I believe that using GATK's HaplotypeCaller (as detailed in the workflow) is inappropriate because, first, I have a mixture of cells with possibly different somatic mutations, and second, it does not compare between the control and transformed lines. Tools such as MuTect seem to be more appropriate. Nevertheless, I don't know if such can be used on RNA-Seq data. For example, the read depth may be highly variable between the "control" and "tumor" samples, because there are large differences in gene expression between the two.
Does any of you know of any tools that can use RNA-Seq as input for calling variants in the tumor vs. normal setting?
Thank you Gil
I'm also facing the issue of calling somatic mutation from RNA-seq data without a normal tissue: how did you filtered the resulting VCF? I mean, lots of variants will be SNPs. Also, how did you account for tumour heterogeneity? Just asking because the problem here is "are we able to describe clonal composition from RNA-seq only?"
hi,
This is what I did for filtering the calls from RNA-seq. First annotate the calls with the read depth info. (no. of reads supporting the variant & ref. allele). Then if you look at the read depth for the var. allele, you would see most have ~2 supporting reads only. Caveat - This is what I see in the above stated library size processed using STAR and then using GATK recommended guidelines for RNA. The median read depth for var. allele is ~2 for most samples I can recall.
So, I select only those SNVs that have >10 reads supporting the var. allele. Only 3-5% of SNVs are retained. But I find this a reliable set to do 'discovery'. I can alway check for detection for any specific variant in the total set.
I haven't addressed heterogeneity using var. calls from RNA. As I noted in the previous post, what variants one calls in the RNA-seq has multiple confounders (the biggest being expression level) and hence very grey area. Similar for, I think, clonal composition from RNA. At least for RNA sequencing library preps. that have PCR amplification step involved (like Illumina).