Hi I am trying to get the coordinates of human mature miRNAs, e.g. hsa-miR-1-3p, hsa-miR-1-5p, ect because I want to use it for HTseq counting so I can get counts specifically for the mature forms and not the pre-miRNAs, as they are typically annoted in ENSEMBLE.
I downloaded the hsa.gff3 from mirbase (http://www.mirbase.org/ftp.shtml) and then used sed to get rid of the precursor coordinates.
Unfortunately I need the file in gtf format not gff3 so I can merge it with the rest coordinates of other small RNAs that I obtained from ensemble.
I think it should be possible to do so with sed/awk but unfortunately my knowledge of these tools is not extensive and I would be so thankful for some help with this. Basically what I need is to convert this gff3 file:
chr1 miRBase miRNA 17409 17431 . - . ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705
chr1 miRBase miRNA 17409 17431 . - . gene_id "miR-6859-5p"; transcript_id "MIMAT0027618";
Any help would be greatly appreciated, I don;t want to use just the galaxy gffread tool as it will not give me the gene ids as I need it for counting...Thank you so much!
Thanks a lot for this Saber, I just downloaded and installed it, but I find it hard to use, as the documentation is sparse. I just downloaded and used make to build the library as in the installation instructions. Which path do I need to use in order to use gt? Are there any example somewhere on how to get started after installing?