Hi,
I am trying to work on a transcriptome analysis of my paddy samples obtained from 2 different condition. My paddy sample belong to Oryza sativa Indica Group and im very confused as to which annotated reference genome i should be using for my mapping and further analysis. I noticed that several papers from the past used MSU RGAP (ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic/Projects/o/sativa/annotation/dbs/pseudomolecules/version_7.0) as their reference sequence. But the .fa file is missing in it.
Did appreciate if anyone could advice me on which reference sequence will be appropriate for me as i see so many data on the japonica type but not the Indica group. Should get my .fa file from 'https://plants.ensembl.org/Oryza_indica/Info/Annotation/'?
Thank you
Hi Umer, thank you for the response. Sorry but i was also wondering what is the difference between the data deposited in MSU RGAP and ensembl Oryza Sativa indica?
Ensembl Oryza sativa indica genome was submitted by Beijing Genomics Institute: https://www.ebi.ac.uk/ena/browser/view/GCA_000004655.2
If the MSU genome came from the same submission then they should be identical. BTW the rice annotation project seems to have moved to: http://rice.uga.edu/
There are a total of 25 indica genomes available in NCBI https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=39946 but out of the lot one referred to above seems to be the most suitable Indica genome to use: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000004655.2/
Thank you so much @genomax. I tried checking out the .fasta files and also reading the README file but still not sure on which .fa will be the best as i see a few (dna.toplevel.fa, dna.rm.toplevel.fa, dna.sm.toplevel.fa).
based on http://genomespot.blogspot.com/2015/06/mapping-ngs-data-which-genome-version.html, repeat masking (rm.toplevel.fa) will not be appropriate and many mappers cannot handle haplotype information found in toplevel/primary assembly. So im assuming dna.sm.toplevel.fa will be the best reference fasta file. Any advice on this?
You can get a fasta file for main chromosomes on this page: https://www.ebi.ac.uk/ena/browser/view/GCA_000004655?show=chromosomes
Click on the
fasta
link next to the download title in the table.