Where to find a list of stress response genes?
1
0
Entering edit mode
5.4 years ago
goodez ▴ 640

I need a list of heat shock genes to exclude from my analysis. I'm working with hg38 genome. I'm not sure the best approach to find all heat shock genes, or if I can download it from somewhere.

Thanks.

EDIT -- Not just heat shock, perhaps more generally I need "stress response genes".

gene annotation • 1.8k views
ADD COMMENT
1
Entering edit mode

Stress is such a general term that it cannot be answered without more details. Stress can be the presence of heat or the presence of a chemotherapeutic agent or just the wrong pH in the incubator. I suggest you define what stress is in your context and then search NCBI for datasets where cells have been exposed to this stress and RNA-seq with proper experimental design (e.g. three biological replicates per stress and control condtion) was performed. Run it through a standard RNA-seq pipeline and extract the genes that come out differentially expressed.

ADD REPLY
0
Entering edit mode

Thanks for the idea. I do agree it is general. However I really do not have more details. I was simply asked to "remove stress response genes" because the biology expert thinks one of our replicates was subjected to some kind of stress during the experiment.

Let's assume I just want to focus on heat shock genes. Is there really no existing source of annotated heat shock genes?

ADD REPLY
1
Entering edit mode

Don't allow that the wet-lab guys fool you. If they want things removed, ask them to clearly specify what they exactly want. If you eventually remove the wrong genes, you'll get 100% of the blame so ask them to be specific ;-)

ADD REPLY
0
Entering edit mode

If one sample is suspected to have been subjected to uncontrolled factors, remove the whole sample, or remove nothing.

ADD REPLY
0
Entering edit mode

You can also do some diagnostics like PCA on the transformed counts (rlog or vst in DESeq2) and see if on the global level you see evidence for a stress exposure (that would be the respective samples clustering away from the unaffected ones of the same treatment group).

ADD REPLY
0
Entering edit mode

I don't have counts. This is ChIP-seq data of Pol II. For the most part the replicates agree with each other, but they think the heat shock genes are activated in one replicate, which affects our average gene metaplot, for example. I was just hoping there was some source of heat shock genes existing... Doesn't need to be a perfect list of genes or anything.

ADD REPLY
0
Entering edit mode

If one replicate is not right, replace it with a good one. But do not try to solve it with bioinformatics, ask your wet-lab colleagues to perform good experiments. Don't try to fix their errors and mistakes.

ADD REPLY
0
Entering edit mode

Overall they correlate very well. It's not a huge concern for us if this is just limited to a small subset of genes. We aren't doing differential expression. We're looking at the general profile of Pol II transcription with and without a treatment. If I remove these genes, and still see the differences, then we'll throw out the replicate.

ADD REPLY
1
Entering edit mode
5.3 years ago
Benn 8.3k

Please read the comments carefully, they warn you that i) you as bioinformatician are not responsible (or the right person) for fixing wet-lab errors, and ii) that a simple list of stress proteins is not available.

However just to help you on the way, I'll show you how to extract all the genes in the genome that start with "HSP", you'll need to download the gtf file from ensembl.

With awk you can extract the genes that begin with "HSP".

awk 'BEGIN {FS="\t"}; $3 == "gene"; {print $9}' Homo_sapiens.GRCh38.78.gtf | \
awk 'BEGIN {FS=";"}; {print $3}' | awk '/"HSP/ {print $2}' | awk '!seen[$1]++'
ADD COMMENT
0
Entering edit mode

Can you explain the final part? !seen[$1]++. I enjoy solving problems with awk, but I'm a novice at best.

Also out of curiosity, could I replace this:

awk 'BEGIN {FS="\t"}; $3 == "gene"; {print $9}' Homo_sapiens.GRCh38.78.gtf

with:

awk `$3 == "gene"` Homo_sapiens.GRCh38.78.gtf | cut -f9

I think both would be the same output right? I'm not saying my version is better, just trying to understand your answer more.

ADD REPLY
0
Entering edit mode

The last part awk '!seen[$1]++' is to get unique names only. Feel free to replace stuff, if that's more convenient for you.

ADD REPLY
0
Entering edit mode

And more on the warnings as a bioinformatician:

I'm happy with all the advice and I totally agree, but I think we just want to see generally if the effect goes away after excluding a set of stress genes. Fixing the problem at the wet-lab stage is definitely the best solution.

ADD REPLY

Login before adding your answer.

Traffic: 2694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6