Question: Chromosome Position From Ucsc Genome Browser
3
gravatar for Gjain
5.7 years ago by
Gjain5.1k
Göttingen, Germany
Gjain5.1k wrote:

Hi all,

I am looking for the coordinates of all the chromosomes in a particular species from ucsc genome browser.

for example, in HG19:

chr1:1-249,250,621
chr2:1-243,199,373
.
.
.
chrX:1-155,270,560

Is there any way to get this list say for human(HG19 or HG18), mouse(mm9 or mm8).

Thanks for your help.

ucsc position chromosome • 6.0k views
ADD COMMENTlink modified 4.2 years ago by Biostar ♦♦ 20 • written 5.7 years ago by Gjain5.1k
3

adzpka azdopi, azdazd azdpkpok azdl azd zefpi,ẑepofioif, zeofpzoa,efoi,pẑop zefopi;oi,zoefo azd

azd


$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,RAND(),RAND() from refFlat limit 4'
+-------+--------------------+------------------+
| chrom | RAND()             | RAND()           |
+-------+--------------------+------------------+
| chr1  |   0.81548497994941 | 0.25082845264192 |
| chr1  |   0.80768735409499 | 0.28595144328284 |
| chr1  | 0.0066951848628886 | 0.17562162519956 |
| chr1  |   0.85802260214279 |  0.7632481658489 |
+-------+--------------------+------------------+
ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Pierre Lindenbaum107k
2

hehe! good one! ;-) see C: C: C: C: A: How to get gene regions +3kb upstream

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Istvan Albert ♦♦ 76k

hahah now I understand those random strings :)

ADD REPLYlink written 5.7 years ago by Gjain5.1k

Thanks Pierre, but I was just looking to find the chromosome start and end.

ADD REPLYlink written 5.7 years ago by Gjain5.1k

do you know what kind of features it describes ? (genes... )

ADD REPLYlink written 5.7 years ago by Pierre Lindenbaum107k

Just a bed file of chromosome coordinates. Basically Chrom# Start End

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Gjain5.1k
4
gravatar for Pierre Lindenbaum
5.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum107k wrote:

the sizes of the chromosomes are stored in a table named "chromInfo" for each build/organism. Is it the information you're looking for ?

e.g for the chromosomes "chr1":

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A  -e 'select "hg18" as build,size from hg18.chromInfo where chrom="chr1" union select "hg19",size from hg19.chromInfo where chrom="chr1"  union select "mm9",size from mm9.chromInfo where chrom="chr1"'
+-------+-----------+
| build | size      |
+-------+-----------+
| hg18  | 247249719 |
| hg19  | 249250621 |
| mm9   | 197195432 |
+-------+-----------+
ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Pierre Lindenbaum107k
1

Not exactly, but using your answer I got mine. Thanks

ADD REPLYlink written 5.7 years ago by Gjain5.1k
3
gravatar for David Langenberger
5.7 years ago by
Deutschland
David Langenberger8.1k wrote:

Just another possible way of getting the chromosome sizes:

fetchChromSizes hg18 | perl -ane 'print "$F[0]:1-$F[1]\n";' > hg18.chromSizes

You can download the fetchChromSizes tool at UCSC.

ADD COMMENTlink written 5.7 years ago by David Langenberger8.1k

Thanks. I did not know about this tool.

ADD REPLYlink written 5.7 years ago by Gjain5.1k
2
gravatar for Gjain
5.7 years ago by
Gjain5.1k
Göttingen, Germany
Gjain5.1k wrote:

Thanks Pierre, I used your solution and tweeked a bit to find what I was looking for.

 mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo order by chrom"
+-----------------------+-----------+
| chrom                 | size      |
+-----------------------+-----------+
| chr1                  | 249250621 |
| chr10                 | 135534747 |
| chr11                 | 135006516 |

Then I can just convert them to chrom start end where start is always 1 and end is size.

so in the end I have:

chr1:0-249250621
chr10:0-135534747
chr11:0-135006516
.
.
.
ADD COMMENTlink modified 25 days ago • written 5.7 years ago by Gjain5.1k
1

Just an update:


mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size, CONCAT(chrom,':',0,'-',size) as coords  from hg19.chromInfo order by chrom"                                                                               
+-----------------------+-----------+--------------------------------+
| chrom                 | size      | coords                         |
+-----------------------+-----------+--------------------------------+
| chr1                  | 249250621 | chr1:0-249250621               |
| chr10                 | 135534747 | chr10:0-135534747              |
| chr11                 | 135006516 | chr11:0-135006516              |

or if you just need information for chromosomes and save it in a tab separated file: 

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -B --skip-column-names -e "select chrom, 0, size as coords  from hg19.chromInfo where chrom NOT LIKE 'chr___%' and chrom NOT LIKE 'chrUn_%';" > hg19.genome
ADD REPLYlink modified 25 days ago • written 5.0 years ago by Gjain5.1k
1

hello everyone! I was just wandering, supposing you want to use this file later with, for example bedtools, do the start positions need to be listed as 0 instead of 1? thanks!

ADD REPLYlink written 28 days ago by Zee_S20
2

he start positions need to be listed as 0 instead of 1?

yes

ADD REPLYlink written 28 days ago by Pierre Lindenbaum107k
1

Yes, you are correct. They internally store everything as the 0-based system and just in the browser, it is 1-based.

For more details: Database/browser start coordinates differ by 1 base

I am confused about the start coordinates for items in the refGene table. It looks like you need to add "1" to the starting point in order to get the same start coordinate as is shown by the Genome Browser. Why is this the case?
Our internal database representations of coordinates always have a zero-based start and a one-based end. We add 1 to the start before displaying coordinates in the Genome Browser. Therefore, they appear as one-based start, one-based end in the graphical display. The refGene.txt file is a database file, and consequently is based on the internal representation.

We use this particular internal representation because it simplifies coordinate arithmetic, i.e. it eliminates the need to add or subtract 1 at every step. If you use a database dump file but would prefer to see the one-based start coordinates, you will always need to add 1 to each start coordinate.

If you submit data to the browser in position format (chr#:##-##), the browser assumes this information is 1-based. If you submit data in any other format (BED (chr# ## ##) or otherwise), the browser will assume it is 0-based. You can see this both in our liftOver utility and in our search bar, by entering the same numbers in position or BED format and observing the results. Similarly, any data returned by the browser in position format is 1-based, while data returned in BED format is 0-based.

For a detailed explanation, please see our blog entry for the UCSC Genome Browser coordinate counting systems.
ADD REPLYlink written 25 days ago by Gjain5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1559 users visited in the last hour