Question: Looking For Frequency Of A Very Specific Rna Editing Event In Publicly Available Rna-Seq Data
gravatar for alpha2zee
7.2 years ago by
alpha2zee120 wrote:

I am working on a ubiquitously and well expressed gene whose ~1.5 kb-sized transcripts get mutated at one and only one site. The mutation is a nonsense one that changes an ORF codon of the mRNA to a stop codon. The underlying biology and whether this mutation occurs in all or specific cells is unknown. The mutation frequency, disregarding the kinetics of transcript degradation, is estimated to vary between 0% and 5% in the only cell-type that it has been studied in.

I am interested in examining publicly available raw RNA sequencing data to get an idea of the types of tissues (e.g., a specific cancer tissue) or cells that this mutation occurs in (as well as the mutation frequency).

Can anyone suggest how I should go about it?

I have looked for but cannot find some site where I might be able to simply perform a similarity search against raw RNA sequencing data. It seems I will have to download such raw data and perform a similarity search.

I am still looking for a way to get raw RNA sequencing data for the Cancer Genome Atlas (TCGA) project – perhaps they are not released to the public? – but I can get data for the Human Body Map and an ENCODE project as .sra files – see and These projects have examined a wide variety of cells/tissues.

Once I download such raw data, how do I actually program my search? I am fairly familiar with using R (and shell scripts). Note that I don't care if the variation I am looking for in the mRNA might be a genomic one (at the DNA level).


rna tcga mutation blast sequencing • 2.6k views
ADD COMMENTlink modified 7.2 years ago • written 7.2 years ago by alpha2zee120
gravatar for Chris Miller
7.2 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

You're right that you're not going to find a pre-loaded web tool that you can just query. You'll almost certainly have to download the bam files and look for then yourself. TCGA bams are available though CGHub:

Once you have the bams you're interested in, you can do it in the manual way and just open them in a genome browser, (like IGV) to look for evidence of your event. Alternately, you could write a little script that runs samtools mpileup on that small region of bam file and grabs readcounts from there.

ADD COMMENTlink written 7.2 years ago by Chris Miller21k
gravatar for alpha2zee
7.2 years ago by
alpha2zee120 wrote:

Thank you for the suggestions. I was able to use samtools mpileup and a Python script for my analyses.

ADD COMMENTlink written 7.2 years ago by alpha2zee120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1465 users visited in the last hour