Finding important genes in a big dataset (top 20-top30)
1
0
Entering edit mode
5 months ago

I have a big dataset (7000) which has categorical variables(gene names) and numerical features, I want to find important genes among others based on their features Based on age(numeric) and time of exposure to harmful substances(numeric), for example the higher the age and time of the exposure, the higher the rank of the genes. In addition, I have also the data of up&down regulation of each gene.

Could anyone suggest me some methods (rather than RRA) than can be suitable for this purpose?

bigdata R randomforest python Rankgenes • 485 views
0
Entering edit mode

Important how? That separate samples? It's not clear how you're defining "important" here or what sort of data you actually have.

0
Entering edit mode

Based on age(numeric) and time of exposure to harmful substances(numeric), for example the higher the age and time of the exposure, the higher the rank of the genes. In addition, I have also the data of up&down regulation of each gene.

0
Entering edit mode

I do not believe this question can be meaningfully answered given the description provided. You mention expression in the final sentence, otherwise we wouldn't even know that.

Be clear. You have (RNA micro arrays; bulk RNA seq; scRNA seq) on 7000 (people, mice) for a (complex, mendelian, somatic) disease. I have 4 (numerical) covariates and ....

2
Entering edit mode
5 months ago

Maybe you can just order the table by age, exposure and select either UP or DOWN. For example you can use dplyr (from the CRAN--> install.packages("dplyr")), on R, to do that. Something like:

require(dplyr)

new_table ->
table %>%
# age and exposure are the names of your columns (without quotation marks)
# this command will sort in descending order by this two variables
arrange(desc(age), desc(exposure)) %>%
# Now let's filter the table by gene expression, where gene_expression is the name of the corresponding column (without ")
filter(gene_expression == "UP")

#select only the top 30
top_30_genes -> new_table[1:30,]\$nameColumnGeneID