Question

mirBase data to write a script to understand which is the dominant arm of a mirna duplex

1

Entering edit mode

2.3 years ago

bio_elle ▴ 10

As per this question I asked last week (Biostar Question), to figure out which is the dominant arm on mirbase I can either look at the name in the 'Previous ID' field and look for the * which indicates the non-dominant arm or I can look at the read count and the dominant arm is the one with more reads. I have to do this for many mirnas and I obviously cannot do it myself one by one. I tried to look at the data in the download section of mirBase but I can't seem to find what I need. For example for miR-373 this is the entry in the miRNA.dat file:

----------
ID   hsa-mir-373       standard; RNA; HSA; 69 BP.

AC   MI0000781;

DE   Homo sapiens miR-373 stem-loop

DR   TARGETS:PICTAR-VERT; hsa-miR-373; hsa-miR-373.

DR   TARGETS:PICTAR-VERT; hsa-miR-373*; hsa-miR-373*.

DR   HGNC; 31787; MIR373.

DR   ENTREZGENE; 442918; MIR373.

FH   Key             Location/Qualifiers

FH

FT   miRNA           6..27

FT                   /accession="MIMAT0000725"

FT                   /product="hsa-miR-373-5p"

FT                   /evidence=experimental

FT                   /experiment="cloned [1]"

FT   miRNA           44..66

FT                   /accession="MIMAT0000726"

FT                   /product="hsa-miR-373-3p"

FT                   /evidence=experimental

FT                   /experiment="cloned [1-2], Northern [1]"

SQ   Sequence 69 BP; 10 A; 13 C; 22 G; 0 T; 24 other;
     gggauacuca aaaugggggc gcuuuccuuu uugucuguac ugggaagugc uucgauuuug        60
     ggguguccc                                                                69


----------

I can see the sequence of the stem-loop and of the coordinates to find the -3p and -5p arms but no information about which is the dominant arm.

The other downloadable data in mirBase are fasta files and files relative to differences from past releases so I don't think they are useful.

Am I looking in the wrong place in mirBase or should I look somewhere else to find this information and be able to extract It for all mirnas with a script?

I am using python, if there isn't an easy way I could probably figure something out with modules like 'beautiful soup' or something similar but it seems very weird to me that there isn't a smarter way to do it.

python mirbase beautifulSoup mirna microrna • 1.0k views

ADD COMMENT • link 2.2 years ago by bio_elle ▴ 10

0

Entering edit mode

This may be intentional as I think mirBase isn't too keen on assigning star sequences (rather than 5p/3p) because they are sometimes wrong and the antisense can be more important anyway. It seems like you are trying to assign ambiguous aggregate hairpin counts to specific mature miRNAs, which isn't possible.

ADD REPLY • link 2.3 years ago by Jeremy Leipzig 22k

0

Entering edit mode

From my understanding star sequences are outdated (which is why you look at the previous ID and not the present one) but they still work to find out which is the non dominant arm, I'd rather look at the read count which seems better but in the downloadable files on mirBase I can't find either.

If a file only has hsa-miR-373 in the name I need to understand if it refers to the 3p or the 5p, in this case the 3p is the dominant one and my assumption would be that the file refers to that one but I need to be able to do it for many mirnas I can't look them up one by one on mirBase.

ADD REPLY • link 2.3 years ago by bio_elle ▴ 10

0

Entering edit mode

I opened a ticket here. I agree this should be easier than it is presently.

ADD REPLY • link 2.2 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Not exactly what I wanted but thanks. I was trying to avoid using R since python is easier but it looks like it has better tools for some of this stuff.

ADD REPLY • link 2.2 years ago by bio_elle ▴ 10