Question: Extracting differentially expressed genes in R
0
gravatar for F
11 months ago by
F3.1k
Iran
F3.1k wrote:

Hi,

I know it is too naive but when I tried to extract genes with fold-change >2.0 or <0.5 alterations I faced error.

I tried so

results <- t[which(abs(t$logFC) > 1 & t$adj.P.Val < 0.05),]

This gives me only fold-change >2.0 , then how I add fold-change < 0.5 to this code please??

Thank you

gene rna-seq R software error • 1.4k views
ADD COMMENTlink modified 11 months ago by Kevin Blighe33k • written 11 months ago by F3.1k
1

Try if following works:

> subset(t, adj.P.Val<0.05 & abs(logFC)>2.0| abs(logFC)<0.5)

I am not sure if it is good to filter the genes by statistical significance and fc at the same time.

ADD REPLYlink modified 11 months ago • written 11 months ago by cpad011210k

Sorry, do you mean I should first filter for statistical significance and then FC or vice versa instead of performing both at the same time????

ADD REPLYlink written 11 months ago by F3.1k
1

I think that it's quite standard to filter based on both fold-change and FDR-adjusted P value. Filtering on just one of these could be problematic.

ADD REPLYlink written 11 months ago by Kevin Blighe33k

Thank you, you are saving me from re-performing because today morning I had completed filtering based on both

ADD REPLYlink written 11 months ago by F3.1k
1

I think that it's generally accepted that fold-changes in RNA-seq can be exaggerated (particularly when based on FPKM-normalised counts), but I think that you would face more criticism by not using a combination of fold-change and FDR-adjusted P value.

ADD REPLYlink written 11 months ago by Kevin Blighe33k
1

IMO, first filter by statistical significance and then by fc. Double filtering (using both p-value and fc at the same time) in microarray analysis is contested multiple times and infact Limma toptable function (for microarray analysis) suggests not to filter by fc and p-value at the same time (https://www.rdocumentation.org/packages/limma/versions/3.28.14/topics/toptable). Double filtering (filtering by fc and p-value from statistical test simultaneous) issue is discussed clearly, way back (PMID:19995439 PMCID:PMC2801685). Here is a long discussion on this on researchgate: https://www.researchgate.net/post/FDR_or_log_fold_change_which_one_is_the_priority_for_selecting_the_DEGs. Here (https://support.bioconductor.org/p/62286/) and here (https://support.bioconductor.org/p/64787/), Gordon Smyth (from EdgeR) discourages filtering by FC and P-value simultaneously (unless I misunderstood the post) for RNAseq data and microarray data. One should be filtering (by fc) within some statistical limits (frame)

In addition, In most of the RNAseq, Exon arrays and Microarray data analysis (recently), I have seen filtering/sorting by p-values first, followed by fc. Please note that this doesn't mean that what you are doing is incorrect and should not do what you are doing. But there is enough literature to support filter the results by (adj)p-values first.

ADD REPLYlink modified 11 months ago • written 11 months ago by cpad011210k
1

So you were just implying not to filter using both at the exact same time, but instead one after the other in a sequential process.

Edit: indeed, I fail to see how this is different from filtering at the same time, provided that the actual function used for filtering is functioning as expected. Also, the end user has to ensure that they know what they're doing. I have seen frequent situations in which a highly statistically significant adjusted P value can be obtained with extremely low fold changes due to a reasonably high level of variation in the data being tested. Equally, one can obtain extremely large fold-changes on other types of data but where the the statistical significance can be almost approaching 1.

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe33k

Thank you, I am working with seven independent microarray GSE series with limma top tables in my hand, now I am going to filter first by adj.P.Val then by fold change.

ADD REPLYlink modified 11 months ago • written 11 months ago by F3.1k
1

Okay. For the record, as per cpad, I never actually use the lfc parameter in limma() when filtering (lfc allows filtering on log2 fold change). I just filter based on adjusted P value and work from there. I rely on other QC and filtering to ensure that the data is properly normalised and that variance will not be an issue in biasing statistics. As I think back in the past, I have only used fold-change cut-offs if asked by a superior to apply it.

All that said, again, it's not wrong to filter by both at the same time provided you understand what's happening, and the potential pitfalls.

ADD REPLYlink written 11 months ago by Kevin Blighe33k
1

if you are using limma toptable function, read about treat and toptreat. Copy/pasted from toptable function:

Users wanting to use fold change thresholding are usually recommended to use treat and topTreat instead.

ADD REPLYlink modified 11 months ago • written 11 months ago by cpad011210k
1

Thanks a lot both of you Kevin and cpad0112 for your time and considerations.

ADD REPLYlink written 11 months ago by F3.1k
4
gravatar for Kevin Blighe
11 months ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

This will give you transcripts that meet either of the following 2 conditions:

  1. logFC>2 and adj.P.Val<0.05
  2. logFC<0.5 and adj.P.Val<0.05

.

t[which((t$logFC>2.0 & t$adj.P.Val<0.05) | (t$logFC<0.5 & t$adj.P.Val<0.05)), ]
ADD COMMENTlink written 11 months ago by Kevin Blighe33k

Thank you so much,

Always helpful quickly. happy new year to you and biostars.

ADD REPLYlink written 11 months ago by F3.1k
1

Happy new year and belated Christmas. Should you not be taking time off? Holidays?

ADD REPLYlink written 11 months ago by Kevin Blighe33k

Thank you, in Iran we don't celebrate Christmas!! our holiday is in March (Iranian New Year) but I know that nowadays there is holiday most parts of the world. Hope you and all health and happiness in 2018.

ADD REPLYlink written 11 months ago by F3.1k
1

Great - I already have a few Iranian colleagues! Evidently I am neither taking time off.

ADD REPLYlink written 11 months ago by Kevin Blighe33k
1

Just a comment, be careful while using filters on logFC vs FC. If you just want to filter on FC, you need to convert that to appropriate log value while filtering.

ADD REPLYlink written 11 months ago by geek_y8.8k

Sorry, because I want to extract genes with Fold Change > 2 then I must extract genes with logFC > 1. I hope I am not wrong

ADD REPLYlink written 11 months ago by F3.1k
1

Be sure that you know what type of statistic each program produces. Limma will produce log base 2 fold-changes (log2FC).

  • Log2FC 2 is equivalent to linear fold-change 4
  • Log2FC 1 is equivalent to linear fold-change 2
  • et cetera
ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe33k

Thank you, with this head of my data

ID  adj.P.Val   P.Value logFC   Gene.symbol
201289_at   6.95E-11    1.27E-15    -5.94112291 CYR61
202768_at   2.43E-09    1.01E-13    -6.95512411 FOSB
209189_at   2.43E-09    1.33E-13    -5.22397202 FOS
201694_s_at 4.67E-09    3.42E-13    -4.0543598  EGR1
210764_s_at 5.41E-09    4.95E-13    -6.53075681 CYR61
201041_s_at 4.39E-07    5.35E-11    -2.45413476 DUSP1
227404_s_at 4.39E-07    5.63E-11    -4.47587421 EGR1
223316_at   1.94E-06    2.83E-10    -4.59230998 CCDC3
201693_s_at 2.42E-06    3.98E-10    -3.41023566 EGR1
220276_at   5.81E-06    1.06E-09    -5.13632928 RERGL
201466_s_at 9.97E-06    2.01E-09    -2.08223261 JUN
222162_s_at 1.48E-05    3.25E-09    -4.94468905 ADAMTS1

In your code I just replace 2.0 with 1.0 to get Fold Change

t[which(abs(t$logFC>1.0 & t$adj.P.Val<0.05) | (t$logFC<0.25 & t$adj.P.Val<0.05)), ]

Moreover, I tried statistical significance and Fold Change at the same time and step by step, this is number of genes;

At_the_same_time=t[which(abs(t$logFC>1.0 & t$adj.P.Val<0.05) | (t$logFC<0.25 & t$adj.P.Val<0.05)), ]

Step-By_Step=subset(t, adj.P.Val<0.05)

Step-By_Step=subset(Step-By_Step,abs(logFC)>1.0| abs(logFC)<0.25)

at the same time produced 2656 genes VS step by step with 2382 genes

ADD REPLYlink modified 11 months ago • written 11 months ago by F3.1k
1

You're not doing the same filtering in At_the_same_time compared to Step-By_Step

  • in At_the_same_time, you are filtering by absolute logFC>1.0 and also logFC<0.25
  • In Step-By_Step, you are filtering by absolute logFC>1.0 and also absolute logFC<0.25

Both methods follow the same logic and should produce the same results

ADD REPLYlink written 11 months ago by Kevin Blighe33k
1

The logic that you're using is actually somewhat confusing, possibly driven by your supervisor's wishes.

You first want genes with logFC>1.0 and <0.25 at the same time? I would do these separately and treat them as separate lists of genes.

...and always check the output to ensure that the functions are doing what you expect.

Never 'trust' a computer algorithm 100%.

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe33k
1

Thank you, you prevented me to do whole of work wrongly.

Actually I am just doing as mentioned in a paper so "

To identify differentially expressed genes (DEGs) in each dataset, statistical analyses were performed,
which reported statistically significant (adjusted p-value < 0.01 and fold-change >2.0 or <0.5) alterations

Thank you again for your time

ADD REPLYlink written 11 months ago by F3.1k

Excuse me, Although weird and non-sense, I ask this question here for which I afraid to create a new post as this question sounds pretty unrelated to this forum. As much as I googled I got confused; Between research associate (grade 7 X2) and postdoctoral researcher (grade 7) which one is preferable (at the same salary)??? Thanks a lot in advance

ADD REPLYlink written 9 months ago by F3.1k
1

Hello again, how have you been?

"research associate (grade 7 X2)" most likely just means that there are 2 positions available, both at Grade 7.

Research Associate is the typical term for postdoctoral scientist / researcher in the UK. Senior Research Associate is then the equivalent of Lecturer and Assistant Professor.

ADD REPLYlink written 9 months ago by Kevin Blighe33k
1

Thanks a lot for your kind word. I am doing well. Now I understood these definitions. Thanks once again

ADD REPLYlink written 9 months ago by F3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1608 users visited in the last hour