I am working on a ubiquitously and well expressed gene whose ~1.5 kb-sized transcripts get mutated at one and only one site. The mutation is a nonsense one that changes an ORF codon of the mRNA to a stop codon. The underlying biology and whether this mutation occurs in all or specific cells is unknown. The mutation frequency, disregarding the kinetics of transcript degradation, is estimated to vary between 0% and 5% in the only cell-type that it has been studied in.
I am interested in examining publicly available raw RNA sequencing data to get an idea of the types of tissues (e.g., a specific cancer tissue) or cells that this mutation occurs in (as well as the mutation frequency).
Can anyone suggest how I should go about it?
I have looked for but cannot find some site where I might be able to simply perform a similarity search against raw RNA sequencing data. It seems I will have to download such raw data and perform a similarity search.
I am still looking for a way to get raw RNA sequencing data for the Cancer Genome Atlas (TCGA) project – perhaps they are not released to the public? – but I can get data for the Human Body Map and an ENCODE project as .sra files – see http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30611 and http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26284. These projects have examined a wide variety of cells/tissues.
Once I download such raw data, how do I actually program my search? I am fairly familiar with using R (and shell scripts). Note that I don't care if the variation I am looking for in the mRNA might be a genomic one (at the DNA level).