I've performed differential expression analyses. I set my p as 0.05, and used a Bonferroni correction across my multiple analyses so my p is now 0.0125. My question is, should I use this p value of 0.125 for selecting my GO terms? Or is a p value of 0.05 sufficient for selecting my enriched GO terms?

The real thing you should worry about in terms of multiple testing for GO analysis is accounting for the fact that you are testing many GO terms. A typical GO analyisis involves testing thousands of gene categories, and each one carries a chance of making a type I error. This makes the cumulative chance of making a type I error very high in a GO analysis. If the categoires were independent, then with 1000 categories, you'd expect 50 categories to come up with an alpha of 0.05 even if there were no real enrichment. However, GO categories are not independent tests as the gene sets overlap (and in some cases are complete subsets of one another), and thus its not clear what the best way to correct for multiple testing is. For this reason, most GO testing frameworks report nominal p-values, rather than adjusted ones.

I tend to use BH correction on my GO testing, however I recognise that this is conservative.

The effect of this is so much bigger than the fact that you've done 4 different analyses that I probably won't worry about the difference between 0.05 and 0.0125, when the real question is between 0.05 and 5*10^-6.

That said, yes, in theory you should correct for multiple testing between different analyses (although I'm not sure many people do).

Great point about the "Go categories are not independent tests"... I had never considered that.