Filter DGElist object: keep only genes expressed in sample at the same time as a particular gene of interest
0
1
Entering edit mode
2.6 years ago
Ondina ▴ 100

Hello, need some help here as I'm kind of stuck with the edgeR DGElist format.

I have a DGE list named x with the following dimensions:

> dim(x )

[1] 11301 52

It's containing a RNA-seq count matrix, I have gene IDs as rows and sample names as columns (as an usual count matrix). I made the filtering, normalization steps and made a differential expression analysis.

Now, following my observations on the DE analysis, I want to filter the DGE list by the following rule:

I want to keep only the genes that are expressed (so > 0) at the same time in each sample as a particular gene with a particular gene id (by the way, there are no replicates in my data).

Maybe there is already an edgeR function that does it (I don't know if filterByExpr by can be used for this). Maybe with grep?

Any ideas?

edger RNA-seq dgelist R • 2.7k views
ADD COMMENT
0
Entering edit mode

An example would help. What is "time" here?

ADD REPLY
0
Entering edit mode

"time" here is the sample. I should have added more information to my question.

I have multiple samples (52) in which there several conditions: Time 0, + 5 hours, + 1day and +8 days. Also if there is the presence or not of a marker. Here's an example:

    > x$counts
 DP10_1.0 F3.10_1.0 F3.20_1.0 F3.10_1.0.1 F3.10_1.5h F3.10_1.5h.1 F3.10_1.24h F3.10_1.24h.1 DP_1.d8 F3._1.d8 F3._1.d8.1 DP10_2.0 F3.10_2.0 F3.20_2.0

ENSMUSG00000000001           44        68        91          63         60           34          71            14     343      370        369       75        43        99

ENSMUSG00000000028            0         2         3           4          0            5          17             3      31       58         38       10         7        14
ADD REPLY
1
Entering edit mode

So you want genes that only have values > 0?

x[rowSums(x == 0) == 0,]
ADD REPLY
0
Entering edit mode

not really as I want to filter my columns in order to keep only the ones where my gene of interest is > 0. And I don't want to loose the DGE list object information so I don't want to convert it into a simple matrix or data.frame

ADD REPLY
0
Entering edit mode

If I understood right, you are looking to filter genes with raw count values > 0 and at the same time you want to save your gene of interest, am I right? Would you like to share a snippet of your code, please?

ADD REPLY
0
Entering edit mode

I'm changing a bit my question now that I thought of it more. I just one to keep the sample columns where the row count values of this gene are >0, so the condition is true

I have for now > x$counts["ENSMUSG00000028369",] > 0

 DP10_1.0     F3.10_1.0     F3.20_1.0   F3.10_1.0.1    F3.10_1.5h  F3.10_1.5h.1   F3.10_1.24h F3.10_1.24h.1       DP_1.d8      F3._1.d8    F3._1.d8.1      DP10_2.0 
    FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE          TRUE          TRUE         FALSE 
F3.10_2.0     F3.20_2.0   F3.10_2.0.1    F3.10_2.5h    F3.20_2.5h  F3.10_2.5h.1   F3.10_2.24h   F3.20_2.24h F3.10_2.24h.1       DP_2.d8      F3._2.d8    F3._2.d8.1 
    FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE          TRUE 
 DP10_3.0      DP11_3.0     F3.10_3.0     F3.20_3.0   F3.10_3.0.1    F3.10_3.5h    F3.20_3.5h  F3.10_3.5h.1   F3.10_3.24h   F3.20_3.24h F3.10_3.24h.1       DP_3.d8 
    FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE          TRUE 
 F3._3.d8    F3._3.d8.1      DP10_4.0      DP11_4.0     F3.10_4.0     F3.20_4.0   F3.10_4.0.1    F3.10_4.5h    F3.20_4.5h  F3.10_4.5h.1   F3.10_4.24h   F3.20_4.24h 
     TRUE          TRUE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE 

I only want to keep the true ones. I can't use neither subset or which method as it's a DGElist object and not a simple matrix.

ADD REPLY
0
Entering edit mode

Okay, I think I found how to do it

columns_keeped <- x$counts["ENSMUSG00000028369",] > 0

x_filtered <- x[,columns_keeped]

It was not so difficult, sometimes you're just tired and don't think of the easy solutions, I hope it worked well with the dge list object, I hope it didn't messed up with the keys keeping the values together inside the dge list.

ADD REPLY
1
Entering edit mode

Yes, that was the best way. By the way, I suggest you to normalize your raw counts using the cpm function and filter your columns based on the abundance of your gene. It would be something like this:

keep <- cpm(x$counts)[""ENSMUSG00000028369", ] > 0 ##Get the columns passing your filtering criteria based on the cpm of the gene of interest

table(keep) ##Visualize how many columns passed the filtering criteria

x <- x[, keep]

As I understood this step is executed previous to the creation of the DGEList.

Best regards!

ADD REPLY
0
Entering edit mode

Okay then, thank you for the suggestion! I'll keep it in mind :)

ADD REPLY

Login before adding your answer.

Traffic: 2740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6