Request for sites which are not promoter and enhancer as negative data for classifier
1
0
Entering edit mode
9.4 years ago
na.cna30 • 0

Hello everyone:

I am looking for negative data for my classifier. I am trying to find specific enhancer (stat1) in human genome. I want human regions which are not regulatory regions and histone modification associated regions.

I would appreciate if someone suggest me such negative region for hg18?

thanks.

ChIP-Seq hg enhancer promoter negative • 2.7k views
ADD COMMENT
1
Entering edit mode
9.4 years ago
aditi.qamra ▴ 270

Are you trying to find enhancers in a specific cell type ? You can get enhancer data from fantom database of other cell types ( say blood cells for comparison with liver tissue ) to increase specificity of the classifier in your cell type.

Alternatively a gross estimate of negative regions could be take all enhancer regions from ENCODE (in case you don't want any regulatory regions - extend this to regions for marks - H3k4me3, me1 and 27ac) across all cell types and get a list of regions that don't overlap any of these.

I would be more comfortable with a tissue specific approach because the presence or absence of a histone mark and "thus a regulatory region" is too broad and dependent on the protocol, tissue, thresholds etc.

ADD COMMENT
0
Entering edit mode

thanks for replying. I have coordinate of specific enhancer (STAT1) for Hela cell. How can I get list of regions which are not regulatory regions? could you explain more about tissue specific approach?

I am trying to identify stat1 regions based on histone marks, but my classifier can't predict well after training.(my neg data is random seq) .thx

ADD REPLY
1
Entering edit mode

if I understand you correctly - you are trying to identify all STAT1 enhancer regions on basis of histone marks - I'm not sure of how is that going to work. Nonetheless, to answer your question,

How can I get list of regions which are not regulatory regions? -- You can use complementBed to get list of all regions that dont overlap with the list of regulatory regions you source from encode/fantom/in-house data etc (https://bedtools.readthedocs.org/en/latest/content/tools/complement.html)

I don't think I understood your objective for the classifier, plus in Hela cells, so what I was saying about comparing it to enhancer regions from other tissues doesn't really hold. But the idea was that if you are building a classifier for enhancer regions in say liver tissue, you might want to use histone signals from the enhancer regions in an entirely different cell type .. say blood cells to get a negative control since we know enhancers are related to cell identity.

ADD REPLY
0
Entering edit mode

Thanks so much and very helpful.

Yes I am trying to identify stat1 regions on basis of histone mark, I am training classifier based on sequence contents of histone marks. my cell line is Hela cells.

Is there any online tool like complementBed to provide list of non-overlapping regulatory regions? this tool working in Linux and OSx machines, I am windows user. I just want non overlapping regions in Hg18.

Thanks again.

ADD REPLY
0
Entering edit mode

If you don't have access to any unix machine, you can try using Galaxy (https://usegalaxy.org/)

ADD REPLY
0
Entering edit mode

could you tell me how can I generate non-overlapping regulatory region in galaxy?

thanks for your help

ADD REPLY
1
Entering edit mode

As I mentioned you can use complementbed in galaxy. What part is not clear? If you opened the link I provided in my answer and browsed through the options on the left hand side.. You would have seen "Operate on Genomic Intervals" under which there is an option of "Complement intervals of dataset". I am happy to help in case you are stuck at some point but it feels like that you did not research this on your own at all. A simple google search would have landed you at https://wiki.galaxyproject.org/Learn/IntervalOperations

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6