How can I normalise GDC TCGA HTSeq Count data (tumour only)
2
0
Entering edit mode
22 months ago
joker33 ▴ 60

Hi!

I downloaded GDC TCGA HTSeq Count data from 32 different cancer types and want to process it. My aim is to create density plots of each cancer and compare them. I want to see if the density plots look similar enough so that I can compare the expression levels of a certain gene directly between cancers.

For this, I need to process and normalise the TCGA HTSeq Counts of the tumour samples (not healthy controls). I am relatively new to Bioinformatics and don't know how to approach this. There are packages like DeSeq2 and EdgeR, but they are focussed on differential gene expression. But I am having only one condition (tumour) and am not interested in differential gene expression, but only normalisation to perform gene expression quantification. Could you possibly let me know what steps to perform?

I would be very grateful for any help! Thanks in advance.

RNA-Seq Normalisation HTSeq Counts TCGA • 1.4k views
1
Entering edit mode
22 months ago
dsull ★ 2.3k

You can use DESeq2 for this. DESeq2 has a very useful vst function that you can use to normalize your HTSeq counts, so that you can compare between different samples.

If you just want to have a quick look at expression levels for a gene of interest across different cancer types, you can also take a look at firebrowse.org and cBioPortal.

Edit: You might also want to take a look at the TCGA pan-cancer data matrices: https://gdc.cancer.gov/about-data/publications/pancanatlas This contains processed TCGA RNA-seq data if you don't want to start from scratch. It also includes batch effect correction (which is important in pan-cancer analyses).

0
Entering edit mode
22 months ago

Using normalization only to compare the gene expression from two different studies is rarely enough. If this is what you are looking for, then as mentioned in @dsull 's answer, you can use DESeq to perform the normalization, or start from the processed/corrected data. Alternatively, checkout GEPIA. There, there you can compare the expression in different cancer with the normal tissues. The web interface is very accessible. You can even download their processed data and perform any number of analysis on it.