How to normalise RNA-sequencing data that only consists of coverage for around 250 genes?
2
0
Entering edit mode
4.6 years ago
dwooi7417 • 0

Hi I am new to Biostars but I have run into a problem where I do not have the expertise to solve.

I performed RNA capture-sequencing on a bunch of cancer samples which means only a select panel of genes are sequenced. With this data I would have liked to apply common across sample normalisation methods such as RLE (Deseq) and TMM however being that this dataset is only 250 genes which were chosen due to their involvement in cancer I worry that they may not follow the assumptions that underly those methods (the main one being that the majority of genes in a sample are not differentially expressed).

I do have ERCCs spiked into the samples. While the intention was to have ERCCs spiked in at relatively similar proportions, some samples ended up with a larger proportion of reads mapping to the ERCCs. Can I still use these ERCCs for normalisation through RUV? Are there any other methods of normalisation? Without a control or any biological replicates how can I check if the normalisation has worked?

RNA-Seq sequencing R normalisation • 1.0k views
ADD COMMENT
0
Entering edit mode
4.6 years ago

You don't need to use RUV, just estimate the scaling factors using the ERCC spike-ins and apply that to the counts from the cancer panel. In fact, the estimateSizeFactors() function in DESeq2 has a controlGenes parameter meant to do exactly this, with the idea being to remove the spike-ins after size factor estimation.

ADD COMMENT
0
Entering edit mode
4.6 years ago

This sounds kind of like an nCounter experiment.

While I think this may mean you need to do some trial and error with your own data set (while some specific nCounter methods are provided, re-analysis with some more general methods seemed to be important for the data sets that I have seen).

So, if you look into that literature, that may give you some ideas. While you don't exactly have positive and negative sequences, you could try test using ERCC counts and/or highly expressed housekeeping counts (or all total / aligned counts).

If the total reads per sample varies a lot, then you may have additional issues. However, these are my thoughts.

ADD COMMENT

Login before adding your answer.

Traffic: 2569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6