How can I normalise GDC TCGA HTSeq Count data (tumour only)
2
0
Entering edit mode
4.4 years ago
joker33 ▴ 110

Hi!

I downloaded GDC TCGA HTSeq Count data from 32 different cancer types and want to process it. My aim is to create density plots of each cancer and compare them. I want to see if the density plots look similar enough so that I can compare the expression levels of a certain gene directly between cancers.

For this, I need to process and normalise the TCGA HTSeq Counts of the tumour samples (not healthy controls). I am relatively new to Bioinformatics and don't know how to approach this. There are packages like DeSeq2 and EdgeR, but they are focussed on differential gene expression. But I am having only one condition (tumour) and am not interested in differential gene expression, but only normalisation to perform gene expression quantification. Could you possibly let me know what steps to perform?

I would be very grateful for any help! Thanks in advance.

RNA-Seq Normalisation HTSeq Counts TCGA • 2.7k views
ADD COMMENT
1
Entering edit mode
4.4 years ago
dsull ★ 6.0k

You can use DESeq2 for this. DESeq2 has a very useful vst function that you can use to normalize your HTSeq counts, so that you can compare between different samples.

If you just want to have a quick look at expression levels for a gene of interest across different cancer types, you can also take a look at firebrowse.org and cBioPortal.

Edit: You might also want to take a look at the TCGA pan-cancer data matrices: https://gdc.cancer.gov/about-data/publications/pancanatlas This contains processed TCGA RNA-seq data if you don't want to start from scratch. It also includes batch effect correction (which is important in pan-cancer analyses).

ADD COMMENT
0
Entering edit mode
4.4 years ago

Using normalization only to compare the gene expression from two different studies is rarely enough. If this is what you are looking for, then as mentioned in @dsull 's answer, you can use DESeq to perform the normalization, or start from the processed/corrected data. Alternatively, checkout GEPIA. There, there you can compare the expression in different cancer with the normal tissues. The web interface is very accessible. You can even download their processed data and perform any number of analysis on it.

ADD COMMENT

Login before adding your answer.

Traffic: 1353 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6