Question: GDC & GTEx RNA sequencing normalization problem
gravatar for Ted
3.5 years ago by
United States
Ted0 wrote:

Dear all,

I've encountered a problem regarding the normalization of RNA sequencing count files that would like ask for your advice.

Basically our goal is to study the differential expressed genes for pancreatic cancer. Here is our procedures for data preprosessing.

  • Downloaded the pancreatic cancer HTSeq raw count data (177 cancers and 4 normals) from GDC data portal.

  • Used the GDC RNA sequencing pipeline to process all the GTEx SRA data (fastq dump -> STAR 2 pass -> fixmate -> HTSeq).

  • Performed TMM normalization to GDC cancer, GDC normal, and GTEx normal count data.

  • Performed voom transform to normalized count file.

We want to see how well the data is normalized so we performed the PCA to our transformed data. We also plot the gene mean/median density across samples between GDC cases and GTEx normals as well as gene mean/median ratio distributions.

PCA, enter image description here

As you can see, GDC cancer and normal are kind of mixed together compared to the GTEx normal. The first peaks on the mean/median plot between GDC and GTEx are bit mismatched. The radio is also away from 1.

My question is: do above phenomena indicate that the TMM normalization is not suitable in this case and large portion of gene will be identified as differential expressed if we carry on to do the DE analysis?

Thank you very much for your help!

rna-seq R • 1.8k views
ADD COMMENTlink written 3.5 years ago by Ted0

Hi Ted,

Clearly,GTEX normal form a separate cluster,seems to me a batch effect.I am not sure,TMM does batch correction. I think you should do batch correction before doing any comparison.

For differential Expression analysis(from counts),check this post for further links:

A: RNA sequencing data batch effect removal

For FPKMS based analysis,you can use Combat from "sva" bioconductor package to just to PCA for initial results.

ADD REPLYlink written 3.5 years ago by Ron990
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 840 users visited in the last hour