Tool to Identify Gene, Regulatory Role, and Function at Integration Sites
1
0
Entering edit mode
7 months ago
kelpotus22 • 0

Is there a tool or website that can identify the gene and its regulatory role at a specified integration site on a chromosome (e.g., 1:20746689), and/or in addition along with its function (e.g., DNA binding activity, nucleosome binding activity)?

chromosome regulatory integration • 577 views
ADD COMMENT
2
Entering edit mode
7 months ago

let's have fun with SPARQL.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| build    | chrom | start                                              | end                                                | gene_id           | gene_name | gene_biotype     | go_id        | go_label                                      |
================================================================================================================================================================================================================================================
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0006334" | "nucleosome assembly"                         |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0000786" | "nucleosome"                                  |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0005634" | "nucleus"                                     |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0070828" | "heterochromatin organization"                |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0031491" | "nucleosome binding"                          |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0006355" | "regulation of DNA-templated transcription"   |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0003677" | "DNA binding"                                 |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0005694" | "chromosome"                                  |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0016607" | "nuclear speck"                               |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0042127" | "regulation of cell population proliferation" |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0005515" | "protein binding"                             |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0097298" | "regulation of nucleus size"                  |
| "GRCh38" | "1"   | "20740266"^^<http://www.w3.org/2001/XMLSchema#int> | "20787323"^^<http://www.w3.org/2001/XMLSchema#int> | "ENSG00000127483" | "HP1BP3"  | "protein_coding" | "GO:0071456" | "cellular response to hypoxia"                |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
`

1) download a human GTF and convert it to XML+RDF with awk 2) download mapping about ensembl and GO from NCBI , join both resources and convert it to XML+RDF with awk 3) concatenate 1 and 2 to create a RDF database 4) query SPARQL with jena/arq

ADD COMMENT
0
Entering edit mode

Thank you so much for the detailed guidance! Just a quick query - would I be running these scripts in a bash environment to replicate the results?

Also, regarding the integration site query, will this process be able to identify the regulatory role of the gene, such as whether the site falls within a promoter or enhancer region?

ADD REPLY
0
Entering edit mode

would I be running these scripts in a bash environment to replicate the results?

yeah, I used sparql for fun but i you don't know them, you should use tools like bedtools intersect and join....

ADD REPLY
0
Entering edit mode

I have been using Linux environment but fairly new, still I'm eager to give them a try. Just to confirm, should I run the awk command you provided first like this:

awk -v BUILD=GRCh38 -f gtf2rdf.awk > output.rdf

Followed by executing the Makefile with:

make Makefile

I'm not quite sure how to proceed with executing the query.01.sparql afterward. Could you please provide guidance on this? Please correct me if I'm wrong. Appreciate your help.

ADD REPLY
0
Entering edit mode

just

make
ADD REPLY

Login before adding your answer.

Traffic: 1726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6