Question

Chromosomal localization tool

0

Entering edit mode

8.2 years ago

Leite ★ 1.3k

Hello everyone!

I've a gene list from chromosome 1, total 400 genes. Well, I would like to know the chromosomal localization of the genes, Example: CDK11B = 1p36.33

I'm doing 1 by 1 in https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/ But it's a lot of work. So, Anyone know a tool that makes it faster?

Ty so much

chromosomal localization gene chromossome • 2.1k views

ADD COMMENT • link updated 8.2 years ago by Alex Reynolds 36k • written 8.2 years ago by Leite ★ 1.3k

score 3 · Accepted Answer · 2017-05-21

This answer assumes reference genome hg19. Adjust for your work, as needed. This answer also assumes you have installed the BEDOPS toolkit, including sort-bed and bedmap.

First, generate a BED file of HGNC names and genomic positions:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e "SELECT k.chrom, kg.txStart, kg.txEnd, x.geneSymbol FROM knownCanonical k, knownGene kg, kgXref x WHERE k.transcript = x.kgID AND k.transcript = kg.name" hg19 | sort-bed - > genes.bed

For example:

$ grep CDK11B genes.bed
chr1    1571099 1655775 CDK11B

Next, generate a BED file of cytobands:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz | gunzip -c | sort-bed - > cytoBand.bed

Finally, map HGNC intervals to cytobands with BEDOPS bedmap --echo-map-id:

$ bedmap --echo --echo-map-id --delim '\t' genes.bed cytoBand.bed > answer.bed

The file answer.bed gives a mapping of gene names to cytobands:

$ head answer.bed
chr1    11873   14409   DDX11L1 p36.33
chr1    14361   19759   WASH7P  p36.33
chr1    14406   29370   WASH7P  p36.33
chr1    34610   36081   FAM138F p36.33
chr1    69090   70008   OR4F5   p36.33
chr1    134772  140566  LOC729737   p36.33
chr1    321083  321115  DQ597235    p36.33
chr1    321145  321207  DQ599768    p36.33
chr1    322036  326938  LOC100133331    p36.33
chr1    327545  328439  LOC388312   p36.33

To return to your example:

$ grep CDK11B answer.bed
chr1    1571099 1655775 CDK11B  p36.33

If you want an answer formatted like 1p36.33, you can awk the chromosome name (chr1) and the fifth field (p36.33) to build the answer as you need it.

$ awk '{ gsub("^chr*", "", $1); print $4"\t"$1$5; }' answer.bed > answer.txt

Then:

$ grep CDK11B answer.txt
CDK11B  1p36.33

If you have a text file of 400 gene names (or whatever), you can grep this file with the -f option:

$ grep -f geneNames.txt answer.txt > filteredAnswer.txt