Question: Where to find a list of stress response genes?
gravatar for goodez
3 months ago by
United States
goodez460 wrote:

I need a list of heat shock genes to exclude from my analysis. I'm working with hg38 genome. I'm not sure the best approach to find all heat shock genes, or if I can download it from somewhere.


EDIT -- Not just heat shock, perhaps more generally I need "stress response genes".

annotation gene • 286 views
ADD COMMENTlink modified 9 weeks ago by b.nota6.1k • written 3 months ago by goodez460

Stress is such a general term that it cannot be answered without more details. Stress can be the presence of heat or the presence of a chemotherapeutic agent or just the wrong pH in the incubator. I suggest you define what stress is in your context and then search NCBI for datasets where cells have been exposed to this stress and RNA-seq with proper experimental design (e.g. three biological replicates per stress and control condtion) was performed. Run it through a standard RNA-seq pipeline and extract the genes that come out differentially expressed.

ADD REPLYlink written 3 months ago by ATpoint13k

Thanks for the idea. I do agree it is general. However I really do not have more details. I was simply asked to "remove stress response genes" because the biology expert thinks one of our replicates was subjected to some kind of stress during the experiment.

Let's assume I just want to focus on heat shock genes. Is there really no existing source of annotated heat shock genes?

ADD REPLYlink written 3 months ago by goodez460

Don't allow that the wet-lab guys fool you. If they want things removed, ask them to clearly specify what they exactly want. If you eventually remove the wrong genes, you'll get 100% of the blame so ask them to be specific ;-)

ADD REPLYlink written 3 months ago by ATpoint13k

If one sample is suspected to have been subjected to uncontrolled factors, remove the whole sample, or remove nothing.

ADD REPLYlink written 3 months ago by h.mon23k

You can also do some diagnostics like PCA on the transformed counts (rlog or vst in DESeq2) and see if on the global level you see evidence for a stress exposure (that would be the respective samples clustering away from the unaffected ones of the same treatment group).

ADD REPLYlink written 3 months ago by ATpoint13k

I don't have counts. This is ChIP-seq data of Pol II. For the most part the replicates agree with each other, but they think the heat shock genes are activated in one replicate, which affects our average gene metaplot, for example. I was just hoping there was some source of heat shock genes existing... Doesn't need to be a perfect list of genes or anything.

ADD REPLYlink written 3 months ago by goodez460

If one replicate is not right, replace it with a good one. But do not try to solve it with bioinformatics, ask your wet-lab colleagues to perform good experiments. Don't try to fix their errors and mistakes.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by b.nota6.1k

Overall they correlate very well. It's not a huge concern for us if this is just limited to a small subset of genes. We aren't doing differential expression. We're looking at the general profile of Pol II transcription with and without a treatment. If I remove these genes, and still see the differences, then we'll throw out the replicate.

ADD REPLYlink written 3 months ago by goodez460
gravatar for b.nota
9 weeks ago by
b.nota6.1k wrote:

Please read the comments carefully, they warn you that i) you as bioinformatician are not responsible (or the right person) for fixing wet-lab errors, and ii) that a simple list of stress proteins is not available.

However just to help you on the way, I'll show you how to extract all the genes in the genome that start with "HSP", you'll need to download the gtf file from ensembl.

With awk you can extract the genes that begin with "HSP".

awk 'BEGIN {FS="\t"}; $3 == "gene"; {print $9}' Homo_sapiens.GRCh38.78.gtf | \
awk 'BEGIN {FS=";"}; {print $3}' | awk '/"HSP/ {print $2}' | awk '!seen[$1]++'
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by b.nota6.1k

Can you explain the final part? !seen[$1]++. I enjoy solving problems with awk, but I'm a novice at best.

Also out of curiosity, could I replace this:

awk 'BEGIN {FS="\t"}; $3 == "gene"; {print $9}' Homo_sapiens.GRCh38.78.gtf


awk `$3 == "gene"` Homo_sapiens.GRCh38.78.gtf | cut -f9

I think both would be the same output right? I'm not saying my version is better, just trying to understand your answer more.

ADD REPLYlink written 8 weeks ago by goodez460

The last part awk '!seen[$1]++' is to get unique names only. Feel free to replace stuff, if that's more convenient for you.

ADD REPLYlink written 8 weeks ago by b.nota6.1k

And more on the warnings as a bioinformatician:

I'm happy with all the advice and I totally agree, but I think we just want to see generally if the effect goes away after excluding a set of stress genes. Fixing the problem at the wet-lab stage is definitely the best solution.

ADD REPLYlink written 8 weeks ago by goodez460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour