Question

Clusters in Differential Gene Expression Data

0

Entering edit mode

5 months ago

Scofield • 0

Hi All,

I am new to the field of Bioinformatics, RNASeq etc. I managed to download differential gene expression data for a cohort of my study from GDC database. This data is having a column named as "cluster", and gene ids are occurring multiple times in different cluster numbers with different p values and log2FC values. I am having a hard time understanding what is this column-cluster representing in my dataset, and how should I consider them before I filter the data using thresholds for adjusted p value and log2FC? I am optimistically looking forward to a guidance from the experts of the field.

Thanks in advance.

gdc gene-expression seurat • 599 views

ADD COMMENT • link updated 5 months ago by Zhenyu Zhang ★ 1.2k • written 5 months ago by Scofield • 0

0

Entering edit mode

can you show the name of the file and the first 10 lines of it?

ADD REPLY • link 5 months ago by Ming Tommy Tang ★ 4.5k

0

Entering edit mode

biostars refused me to post answers as "not supported language"

ADD REPLY • link 5 months ago by Zhenyu Zhang ★ 1.2k

score 0 · Answer 1 · 2024-06-09

I'm not familiar with the database, but maybe you figured it out already.

Generally, though, it depends on what the database defined as clusters. They should have some guidance in their documentation or literature. Might take some digging.

It sounds like maybe groups of the subjects are clustered and then gene expression is shown for those groups, explaining why gene Ids are repeated with different stats, eg. cluster 1 is skin cancer and cluster 2 is blood cancer, so they you can see Myc expression in each conditions. But really, I don't know.