How to remove batch effects when i have different organs?
0
0
Entering edit mode
5.6 years ago

Hello,

I focus on three subspecies of bat, individuals of three subspecies were sequenced by different platforms and have inconsistent sex. l have a RNA-seq counts matrix containing the counts of three organs per individuals, when PCA analysis for three organs separately, the plots cluster together according to sequencing platform.

To remove the batch effects, I plan to use 'sva' in R. However, I have a doubt that should I generate the counts of different organs together or separate the counts according to different for 'sva'?

RNA-Seq • 1.2k views
ADD COMMENT
1
Entering edit mode

Can you give some more details on the library prep and sequencing platform. It is quiet unusual that the platform is the dominating factor given that other covariates like gender and organ are present. Please also show the PCA plot.

ADD REPLY
0
Entering edit mode

It isn't that uncommon batch effects dominating other experimental design factors, in fact there is a famous case:

A reanalysis of mouse ENCODE comparative gene expression data

But I fully agree, jyshiningsoul should provide more details, for example, explain the experimental design and how batch confounds with it - in the worst case, they can't be untangled.

ADD REPLY
0
Entering edit mode

Thank you for your reply. In my experimental design, I want to test whether the phylogenetic trees constructed from gene expression of organs of subspecies will be in accordance with real phylogenetic tree or not. But the different platform and inconsistent sex of individuals may have effect. For example, the PCA plot reflected A1 and C1 clustered together in organ brain, and the two individuals (A1, C1) were sequenced by the same company. I want to remove the batch effects to find the true classification in organs pca plot. But I don't how to remove the batch effects in my project. (you can see the pca plot and information of individuals in the post)

ADD REPLY
0
Entering edit mode

Thank you for your reply, and I am sorry to reply to you so late. Each library construction (average insert size = 300 bp), and sequencing on an Illumina HiSeq 4000 sequencer (150 bp paired-end) were conducted by different companies. But the platform didn't dominate the distribution of the plots, just affected several plots. The pca plot (https://ibb.co/nGQ680) and information of individuals (https://ibb.co/jauVJ0).

Thank you a lot.

ADD REPLY

Login before adding your answer.

Traffic: 1291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6