I have a dataset containing data of normalized gene expression. I have in total 18703 genes in a wt strain against 5 different conditions (WT, A, B, C ,D), and each experiment has 10 biological replicates (WT1-10, A1-10, B1-10, C1-10, D1-10) My dataframe is 18704x61, and for the first two genes it looks like this (only for the wt strain and the first replicate of the A condition). (I used --- here just to get each condition over the corresponding value, ofc there aren't in my dataframe)
Gene WT1 ---- WT2---- WT3----- WT4 ---- WT5 ----- WT6 ---- WT7 -----WT8 --- WT9 ----- WT10 --- A1 ---....etc
1 5.727474 6.028452 6.051629 7.797896 7.716835 7.741452 7.908992 8.078486 8.291032 8.513065 5.548691
2 5.681870 6.808582 5.613130 6.022544 5.462886 6.606438 6.029086 6.982468 6.542026 6.455662 4.940125
I want to extract a subset of these genes that correspond to the top 5% most expressing. Any ideas?
criteria for most expressing?