Question

Why Is Omim'S Mim2Gene.Txt File More Inclusive Than Omim'S Genemap?

0

Entering edit mode

12.3 years ago

rolyata47 ▴ 40

To make things less confusing, a description of these files can be found here: http://www.omim.org/downloads

I am getting these files from the link that is emailed me after I subscribe to the site. When you run a few simple commands on them, you'll see they each have a different number of distinct OMIM IDs.

$ wc -l genemap
13890 genemap

$ cut -f 1 mim2gene.txt | uniq | wc -l
22840

In the genemap, each row has a unique OMIM ID. In the mim2gene file, the first column is the OMIM ID, so we get the unique OMIM IDs and a count of them... and, voila, the two counts are very different!

Why would this be? That is, why does mim2gene account for far more OMIM IDs than genemap? Is mim2gene more inclusive? If so, how? And if not... is this an error on the part of OMIM?

I appreciate any feedback :-)

• 4.7k views

ADD COMMENT • link updated 12.3 years ago by Christian ★ 3.1k • written 12.3 years ago by rolyata47 ▴ 40

score 1 · Answer 1 · 2013-04-03

1

Entering edit mode

12.3 years ago

Christian ★ 3.1k

Try to sort first before using uniq:

$ cut -f 1 mim2gene.txt | sort | uniq | wc -l

I have no idea how the data looks like and if it is sorted already, so i cannot tell you for sure if it makes a difference.

ADD COMMENT • link 12.3 years ago by Christian ★ 3.1k