Hello, I am new to the area of bioinformatics, so apologies if this is too obvious of a query.
I need to analyze the RNA Seq data from GSE98455.
This a RNA-Seq dataset for Rice and is of the following format:
|---------------------|------------------|
| Some Id | Counts |
|---------------------|------------------|
| 13101.t00001 | 392 |
|---------------------|------------------|
| 13101.t00002 | 20 |
|---------------------|------------------|
The platform for this data is Illumina HiSeq 2500.
My question is how do I map a certain rice gene to the Id column so I can extract the appropriate counts ? For example if I want to find the count for the gene OsNAC6 then how do I map this to the ID column ?
Thank you for your insights.
You can find the Gene name and Gene ID from the annotation used in the alignment. Now in the associated paper I couldn't see which annotation they have taken, but they mention "Oryza sativa japonica reference genome v 6". Try finding genome annotation associated with that. Or probably contact the authors and ask them for the Gene ID and Gene Name file.
I see I will try to look for the genome, if I can't I will get in touch with the authors. Thank you for the directions, I wouldn't have figured this out myself.
I looked a little bit more. These id seems to be from annotation for MSU v6 (MSU Rice Genome Annotation Project osa1r6). I couldn't find any annotation file for that. I tried looking into the rice database but with no luck. I think approaching the authors would be an easy and fast way. Good luck!
I searched for it as well no luck. I reached out the author, hopefully they can help me out. Is it usual to provide datasets without such key files ? Or is it a security thing ?
They are required to upload raw and processed data. Nobody checks if the processed data is actually useful by itself. Since it is rice genome, it is very hard to find the information. When I google search the id you mentioned, it showed a post where the id was listed for rice genome and it was from a annotation file (.gff), but I couldn't locate the .gff file on Ensembl or MSU. If you don't get response from the authors, try posting again with new heading saying where can you get the gff or gtf file for MSU v6, I think that will help you. If you have some bioinformatician or you yourself can re-align the raw sequencing files then you don't need to find the specific annotation file.
Thank you for the detailed response, you have kind and generous. I will update here if I get a response from the author. I will also reach out to the bioinformatician on our team, perhaps she can help out. It sad that there is data, but It is not usable.