Reference genome for oryza sativa indica group
1
0
Entering edit mode
9 months ago
sumitra.20 • 0

Hi,

I am trying to work on a transcriptome analysis of my paddy samples obtained from 2 different condition. My paddy sample belong to Oryza sativa Indica Group and im very confused as to which annotated reference genome i should be using for my mapping and further analysis. I noticed that several papers from the past used MSU RGAP (ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic/Projects/o/sativa/annotation/dbs/pseudomolecules/version_7.0) as their reference sequence. But the .fa file is missing in it.

Did appreciate if anyone could advice me on which reference sequence will be appropriate for me as i see so many data on the japonica type but not the Indica group. Should get my .fa file from 'https://plants.ensembl.org/Oryza_indica/Info/Annotation/'?

Thank you

transcriptome indicagroup RNA-seq reference_sequence • 811 views
ADD COMMENT
1
Entering edit mode
9 months ago
Umer ▴ 50

You can use the ensembl reference sequence along its annotation file. ensembl Oryza Sativa indica

ADD COMMENT
0
Entering edit mode

Hi Umer, thank you for the response. Sorry but i was also wondering what is the difference between the data deposited in MSU RGAP and ensembl Oryza Sativa indica?

ADD REPLY
0
Entering edit mode

Ensembl Oryza sativa indica genome was submitted by Beijing Genomics Institute: https://www.ebi.ac.uk/ena/browser/view/GCA_000004655.2

If the MSU genome came from the same submission then they should be identical. BTW the rice annotation project seems to have moved to: http://rice.uga.edu/

There are a total of 25 indica genomes available in NCBI https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=39946 but out of the lot one referred to above seems to be the most suitable Indica genome to use: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000004655.2/

ADD REPLY
0
Entering edit mode

Thank you so much @genomax. I tried checking out the .fasta files and also reading the README file but still not sure on which .fa will be the best as i see a few (dna.toplevel.fa, dna.rm.toplevel.fa, dna.sm.toplevel.fa).

based on http://genomespot.blogspot.com/2015/06/mapping-ngs-data-which-genome-version.html, repeat masking (rm.toplevel.fa) will not be appropriate and many mappers cannot handle haplotype information found in toplevel/primary assembly. So im assuming dna.sm.toplevel.fa will be the best reference fasta file. Any advice on this?

ADD REPLY
1
Entering edit mode

You can get a fasta file for main chromosomes on this page: https://www.ebi.ac.uk/ena/browser/view/GCA_000004655?show=chromosomes

Click on the fasta link next to the download title in the table.

ADD REPLY

Login before adding your answer.

Traffic: 1699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6