Question

Reference genome for oryza sativa indica group

0

Entering edit mode

9 months ago

sumitra.20 • 0

Hi,

I am trying to work on a transcriptome analysis of my paddy samples obtained from 2 different condition. My paddy sample belong to Oryza sativa Indica Group and im very confused as to which annotated reference genome i should be using for my mapping and further analysis. I noticed that several papers from the past used MSU RGAP (ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic/Projects/o/sativa/annotation/dbs/pseudomolecules/version_7.0) as their reference sequence. But the .fa file is missing in it.

Did appreciate if anyone could advice me on which reference sequence will be appropriate for me as i see so many data on the japonica type but not the Indica group. Should get my .fa file from 'https://plants.ensembl.org/Oryza_indica/Info/Annotation/'?

Thank you

transcriptome indicagroup RNA-seq reference_sequence • 811 views

ADD COMMENT • link updated 9 months ago by GenoMax 141k • written 9 months ago by sumitra.20 • 0

score 1 · Accepted Answer · 2023-07-25

1

Entering edit mode

9 months ago

Umer ▴ 50

You can use the ensembl reference sequence along its annotation file. ensembl Oryza Sativa indica

ADD COMMENT • link 9 months ago by Umer ▴ 50

0

Entering edit mode

Hi Umer, thank you for the response. Sorry but i was also wondering what is the difference between the data deposited in MSU RGAP and ensembl Oryza Sativa indica?

ADD REPLY • link 9 months ago by sumitra.20 • 0

0

Entering edit mode

Ensembl Oryza sativa indica genome was submitted by Beijing Genomics Institute: https://www.ebi.ac.uk/ena/browser/view/GCA_000004655.2

If the MSU genome came from the same submission then they should be identical. BTW the rice annotation project seems to have moved to: http://rice.uga.edu/

There are a total of 25 indica genomes available in NCBI https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=39946 but out of the lot one referred to above seems to be the most suitable Indica genome to use: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000004655.2/

ADD REPLY • link 9 months ago by GenoMax 141k

0

Entering edit mode

Thank you so much @genomax. I tried checking out the .fasta files and also reading the README file but still not sure on which .fa will be the best as i see a few (dna.toplevel.fa, dna.rm.toplevel.fa, dna.sm.toplevel.fa).

based on http://genomespot.blogspot.com/2015/06/mapping-ngs-data-which-genome-version.html, repeat masking (rm.toplevel.fa) will not be appropriate and many mappers cannot handle haplotype information found in toplevel/primary assembly. So im assuming dna.sm.toplevel.fa will be the best reference fasta file. Any advice on this?

ADD REPLY • link 9 months ago by sumitra.20 • 0

1

Entering edit mode

You can get a fasta file for main chromosomes on this page: https://www.ebi.ac.uk/ena/browser/view/GCA_000004655?show=chromosomes

Click on the fasta link next to the download title in the table.

ADD REPLY • link 9 months ago by GenoMax 141k