Question: Batch correction for RNA seq data
0
gravatar for krushnach80
4 months ago by
krushnach80200
krushnach80200 wrote:

I m running tophat protocol i have cufflink files , when i plot those normalised files as boxplot see quite a variation among the samples and i have take those data from three different source .So what kind of batch correction can I use to make those variation less .?

Any help and suggestion would be highly appreciated

rna-seq • 362 views
ADD COMMENTlink modified 4 months ago by Gjain5.1k • written 4 months ago by krushnach80200
1
gravatar for Gjain
4 months ago by
Gjain5.1k
Göttingen, Germany
Gjain5.1k wrote:

Hi,

Have a look into RUVSeq: Remove Unwanted Variation from RNA-Seq Data

Normalization of RNA-sequencing (RNA-seq) data has proven essential to ensure accurate inference of expression levels. Here, we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more complex unwanted technical effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, called remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple laboratories, technicians, and/or sequencing platforms.

enter image description here

links:

ADD COMMENTlink written 4 months ago by Gjain5.1k

but to use RUVseq there should be ERCC spike mix .The sample Im using doesnt have ERCC so I cannot use RUVSeq

ADD REPLYlink written 4 months ago by krushnach80200
1

Please read the documentation. There are 4 different types of normalization and ERCC is just one of them.

ADD REPLYlink written 4 months ago by Gjain5.1k

so can I use the cufflink data in the normalisation?

ADD REPLYlink written 4 months ago by krushnach80200
2

You need to use the raw counts. As you don't have spike-in controls, it will estimate the systematic effects on least differentially expressed genes ( ignoring top 5000 differential genes, by default ) and use them to normalize the data. The documentation is pretty clear.

ADD REPLYlink modified 4 months ago • written 4 months ago by geek_y8.1k

yes that the other way to do it and I had done with one of the sample that has spike in control..so I cannot use the cufflink data batch correction ?

ADD REPLYlink written 4 months ago by krushnach80200
1

Instead of providing the cufflinks calculated FPKM values, you could calculate the FPKM/RPKM using edgeR rpkm() function, and then correct for batches. If you know the batches, you can use removeBatchEffect() in edgeR, otherwise try to use RUVSeq. I am not sure if there is any package that accepts the normalized values and corrects for batches. You have to tweak to make sure you are not over normalizing the data.

ADD REPLYlink modified 4 months ago • written 4 months ago by geek_y8.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1428 users visited in the last hour