As per this question I asked last week (Biostar Question), to figure out which is the dominant arm on mirbase I can either look at the name in the 'Previous ID' field and look for the * which indicates the non-dominant arm or I can look at the read count and the dominant arm is the one with more reads. I have to do this for many mirnas and I obviously cannot do it myself one by one. I tried to look at the data in the download section of mirBase but I can't seem to find what I need. For example for miR-373 this is the entry in the miRNA.dat file:
---------- ID hsa-mir-373 standard; RNA; HSA; 69 BP. AC MI0000781; DE Homo sapiens miR-373 stem-loop DR TARGETS:PICTAR-VERT; hsa-miR-373; hsa-miR-373. DR TARGETS:PICTAR-VERT; hsa-miR-373*; hsa-miR-373*. DR HGNC; 31787; MIR373. DR ENTREZGENE; 442918; MIR373. FH Key Location/Qualifiers FH FT miRNA 6..27 FT /accession="MIMAT0000725" FT /product="hsa-miR-373-5p" FT /evidence=experimental FT /experiment="cloned " FT miRNA 44..66 FT /accession="MIMAT0000726" FT /product="hsa-miR-373-3p" FT /evidence=experimental FT /experiment="cloned [1-2], Northern " SQ Sequence 69 BP; 10 A; 13 C; 22 G; 0 T; 24 other; gggauacuca aaaugggggc gcuuuccuuu uugucuguac ugggaagugc uucgauuuug 60 ggguguccc 69 ----------
I can see the sequence of the stem-loop and of the coordinates to find the -3p and -5p arms but no information about which is the dominant arm.
The other downloadable data in mirBase are fasta files and files relative to differences from past releases so I don't think they are useful.
Am I looking in the wrong place in mirBase or should I look somewhere else to find this information and be able to extract It for all mirnas with a script?
I am using python, if there isn't an easy way I could probably figure something out with modules like 'beautiful soup' or something similar but it seems very weird to me that there isn't a smarter way to do it.