Question: Location of Genes on Chromosome
0
gravatar for spriyansh29
7 weeks ago by
spriyansh2930
spriyansh2930 wrote:

I have a list of Genes (Ensemble ids). I need to find their locations in the human chromosomes. Information that I need- Start location Stop Location Length

I have tried ShinyGo but got only a graph with the location of genes on the chromosome but not exact locations.

ADD COMMENTlink modified 7 weeks ago by i.sudbery7.3k • written 7 weeks ago by spriyansh2930
0
gravatar for mark.ziemann
7 weeks ago by
mark.ziemann1.2k
Australia/Mebourne/Geelong/Deakin
mark.ziemann1.2k wrote:

Gene location information can be found in the GTF or GFF files on the Ensembl FTP site. You just need to make sure that the version of the GTF/GFF file you use is the same as the annotation of the gene list you received. Older versions of Ensembl can be found at the archive.

Here is the first few lines of a GTF file. You can see lines that have "gene" in the 3rd column, show the coordinates in column 4 and 5

#!genome-build GRCh38.p5
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.20
#!genebuild-last-updated 2015-10
1   havana  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2";
1   havana  transcript  11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    11869   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    12613   12721   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    13221   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1";

You can use your favorite scripting language to extract the coordinate information for your genes of interest.

ADD COMMENTlink written 7 weeks ago by mark.ziemann1.2k
0
gravatar for i.sudbery
7 weeks ago by
i.sudbery7.3k
Sheffield, UK
i.sudbery7.3k wrote:

If your list of Ensembl IDs isn't too long (500 max), probably the eaiest way to get the starts and the stop is from biomart. Select the "Ensembl Genes" database, and the "Human genes" dataset. Enter your gene ids under Filters>Gene>Input external references ID list and under Attributes>Gene select Gene ID, Chromosome, Gene Start and Gene End.

Note that because "Gene start" is the earliest start coordinate of any transcript associated with that gene, and gene end is the last end coordinate of any transcript, then the "Length" of the gene will almost certainly be longer than the length of any individual transcript.

ADD COMMENTlink written 7 weeks ago by i.sudbery7.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2220 users visited in the last hour