Question: ALEXA-Seq: gene expression above noise level
6
gravatar for elmbeech
5.1 years ago by
elmbeech70
United States
elmbeech70 wrote:

Lately I was working with the RNA-seq data from a breast cancer cell line panel, which was generated with the ALEXA-seq pipeline.

I was fascinated by the available expressed 0/1 information for every gene. So I had a look at the 'Alternative expression analysis by RNA sequencing' paper and the supplementary information (Figures 5 and 6) . The method described to identify the status of expressed below or above intergenic and locus specific (intragenic) noise is, as far as I understood, based on the measured expression level of exon regions, silent intron regions, and silent intergenic regions.

I wonder if it is possible to adapt this method, so that it can be used generically on any kind of RNA-seq pipeline.
Key question thereby is, if a downloadable reference genome (e.g. Homo_sapiens.GRCh37.75.gtf.gz file at ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/ server) contains all the mentioned kind  of genomic regions (exon, silent intron, and silent intergenic)? And further, how is one able to distinguish between these genomic regions?

Any insight is welcome!  Thank you,

Elmar

rna-seq alexa-seq gene genome • 1.7k views
ADD COMMENTlink modified 5.1 years ago by Malachi Griffith17k • written 5.1 years ago by elmbeech70
3
gravatar for Malachi Griffith
5.1 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

In ALEXA-seq, a work that is now arguably deprecated by newer tools, we tried to classify features as 'expressed above background noise levels as follows' (refer to the ALEXA-seq manuscript and supplementary materials for more details):

  •  We identified thousands of negative control intergenic regions of varying size throughout the genome.  These regions were defined by subtracting out known or predicted genes as well as regions with any evidence of expression from mRNAs in genbank or ESTs in dbEST.
  • From the set of candidate negative controls, we chose a subset that are most representative of real genes with respect to size and GC content.
  • Using these as negative controls we chose the 95th percentile of expression values as an estimate of background noise that you might see from any region regardless of whether it was really expressed.  i.e. a cutoff that has a 'rationale' behind it.
  • For splicing analysis, the problem is more complex.  Say you have some evidence for expression of an intron or novel exon within a known gene.  This region may have the same level of noise as any region in the genome.  However, it will also have additional noise from expression actually occurring at that locus.  You will have unprocessed RNA in your sample that will increase noise in all introns.  You will also have stochastic splicing errors.  These sources of noise will be correlated with expression level.  The more actively transcribed the region, the higher the noise levels.  Thus a single cutoff for all loci is inadvisable. For that reason we again chose negative control features, within genes this time, that again have no prior evidence of being expressed in known databases.  We then plotted the expression of these controls against expression of the gene they reside within (see Supplementary Figure 5 for an example). We then fit a linear model to that data and used it to derive a sliding background noise cutoff on a gene-by-gene basis.  That way a novel exon within a highly expressed locus has to pass a higher bar to be considered real than one in a lowly expressed locus.

If you want to dig into some of the code that implemented these concepts including the code to generate Supplementary Figure 5, you can look here: summarizeExpressionValues, alternativeExpressionDatabase

Related manuscript: Pubmed | Full text | PDF | Supplementary Information | GEO (GSE23776) | News and Views

For a review of tools related to rna-seq expression and splicing analyses you might refer to these posts:

Recommended Tools For Alternative Splicing Detection From Rna-Seq Data

Best Approach To Predict Novel And Alternative Splicing Events From Rna-Seq Data

Is There A "Gold Standard Rnaseq Data" To Compare With Various Rnaseq Tools For Differential Expression Analysis

 

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Malachi Griffith17k
1

Malachi, thank you for this detailed answer. This brings me quite a bit further.
Now it is especially more clear to me how you defined the 'silent' negative controls for intron and intergenic regions.

ADD REPLYlink written 5.1 years ago by elmbeech70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2320 users visited in the last hour