I have few paired small RNA seq plasma samples. I did all the quality control, trimming, mapping and quantification for the case and control samples. Now i am trying to know the number of reads mapped to all differnet non coding small RNA species (microRNA, scRNA, snoRNA, snRNA, rRNA, tRNa). All i want to know if that if look at the gtf file, everything starting with mir is all the microRNAs, so i can easily count those. Can anyone tell me about scRNA, snoRNA, snRNA, rRNA and tRNa. I see SCARNA and SNAR in the list also.
maybe u can download the human non-coding RNA sequences fasta file from ensemble ftp download site, then map your sequence to human non-coding sequences, but the problem is that the file contains not only small non-coding RNA but also long non-coding RNA. so u can remove the long non-coding sequence after download it.
It looks like you want to conclude which ncRNA class a gene is depending on its name in the gtf file. If this is true, it doesn't sounds like a very reliable approach.
As an idea, I usually create annotation files from GENCODE, for which you can get a table that contains also biotype information (which includes all the ncRNA classes you mentioned above).
I think you could get more concrete help by explaining clearly what you have in your hands and what kind of solution you are seeking. For example, if you have a bed file there are probably webservices that can give you a full annotation in few minutes without any scripting and downloading involved. To go one step further, there are webservices such as Oasis (https://oasis.dzne.de/index.php) that can give you the full small RNA analysis done, including counts in the distinct small RNA classes.