Question: Reading EMBL-like miRBase data into R
gravatar for atakanekiz
4 months ago by
atakanekiz220 wrote:


I'm trying to use miRNA data from the mirBase database in some of my R pipelines. The data file I'm interested in is the mirna.dat which contains info from all published miRNAs (across multiple species).

One entry within the data file looks like this (output of readLines() function)

[1] "ID   cel-let-7         standard; RNA; CEL; 99 BP."                               
  [2] "XX"                                                                              
  [3] "AC   MI0000001;"                                                                 
  [4] "XX"                                                                              
  [5] "DE   Caenorhabditis elegans let-7 stem-loop"                                     
  [6] "XX"                                                                              
  [7] "RN   [1]"                                                                        
  [8] "RX   PUBMED; 11679671."                                                          
  [9] "RA   Lau NC, Lim LP, Weinstein EG, Bartel DP;"                                   
 [10] "RT   \"An abundant class of tiny RNAs with probable regulatory roles in"         
 [11] "RT   Caenorhabditis elegans\";"                                                  
 [12] "RL   Science. 294:858-862(2001)."                                                
 [13] "XX"                                                                              
 [14] "RN   [2]"                                                                        
 [15] "RX   PUBMED; 12672692."                                                          
 [16] "RA   Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB," 
 [17] "RA   Bartel DP;"                                                                 
 [18] "RT   \"The microRNAs of Caenorhabditis elegans\";"                               
 [19] "RL   Genes Dev. 17:991-1008(2003)."                                              
 [20] "XX"                                                                              
 [21] "RN   [3]"                                                                        
 [22] "RX   PUBMED; 12747828."                                                          
 [23] "RA   Ambros V, Lee RC, Lavanway A, Williams PT, Jewell D;"                       
 [24] "RT   \"MicroRNAs and other tiny endogenous RNAs in C. elegans\";"                
 [25] "RL   Curr Biol. 13:807-818(2003)."                                               
 [26] "XX"                                                                              
 [57] "XX"                                                                              
 [58] "CC   let-7 is found on chromosome X in Caenorhabditis elegans [1] and pairs to"  
 [59] "CC   sites within the 3' untranslated region (UTR) of target mRNAs, specifying"  
 [60] "CC   the translational repression of these mRNAs and triggering the transition"  
 [61] "CC   to late-larval and adult stages [2]."                                       
 [62] "XX"                                                                              
 [63] "FH   Key             Location/Qualifiers"                                        
 [64] "FH"                                                                              
 [65] "FT   miRNA           17..38"                                                     
 [66] "FT                   /accession=\"MIMAT0000001\""                                
 [67] "FT                   /product=\"cel-let-7-5p\""                                  
 [68] "FT                   /evidence=experimental"                                     
 [69] "FT                   /experiment=\"cloned [1-3], Northern [1], PCR [4], 454 [5],"
 [70] "FT                   Illumina [6], CLIPseq [7]\""                                
 [71] "FT   miRNA           60..81"                                                     
 [72] "FT                   /accession=\"MIMAT0015091\""                                
 [73] "FT                   /product=\"cel-let-7-3p\""                                  
 [74] "FT                   /evidence=experimental"                                     
 [75] "FT                   /experiment=\"CLIPseq [7]\""                                
 [76] "XX"                                                                              
 [77] "SQ   Sequence 99 BP; 26 A; 19 C; 24 G; 0 T; 30 other;"                           
 [78] "     uacacugugg auccggugag guaguagguu guauaguuug gaauauuacc accggugaac        60"
 [79] "     uaugcaauuu ucuaccuuac cggagacaga acucuucga                               99"
 [80] "//"

The data is formatted similar to EMBL data structure which doesn't play nicely with R's base read functions. I tried a gbRecord EMBL parser function from biofiles package but it threw an error message saying mandatory fields are not found. I think, although the mirBase data is similar to EMBL, it is not structured the same causing the failure here. Do you have a recommendation for ways to deal with this type of data?

Best regards, Atakan

parse embl mirdb R • 112 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by atakanekiz220
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 911 users visited in the last hour