Which Genes Are Contained In The Alternate Reference Loci In Hg38?
1
1
Entering edit mode
7.7 years ago

The most recent versions of the human genome assembly (e.g. hg38 and GRch37) contain a new feature, called alternate reference loci.

How can I see which genes are included in these regions? I would guess that these correspond mostly to HLA and olfactory regions.

genes • 3.7k views
ADD COMMENT
3
Entering edit mode
7.7 years ago

try this ?

$ curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz" | gunzip -c | awk -F ' ' '$3 ~ /alt$/' | cut -f 3,13 | uniq 

chr10_GL383545v1_alt    PTCHD3
chr10_GL383546v1_alt    OR13A1
chr10_GL383546v1_alt    ALOX5
chr10_GL383546v1_alt    MARCH8
chr10_KI270825v1_alt    DLG5
chr10_KI270825v1_alt    DLG5-AS1
chr11_JH159136v1_alt    OR5T2
chr11_JH159136v1_alt    OR5T3
chr11_JH159136v1_alt    OR5T1
chr11_JH159136v1_alt    OR8H1
(..)
ADD COMMENT
1
Entering edit mode

Thanks! I could not find the table. You should also add "cut -f2 | sort| uniq | wc -l" because some genes are repeated (they appear in more than one alternate locus). The answer I get now is 1039.

ADD REPLY
1
Entering edit mode

sort -u is slightly faster and less keystroke than sort | uniq Sort & uniq in Linux shell

ADD REPLY

Login before adding your answer.

Traffic: 1618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6