How many gene transcripts are there in Rat genome (latest assembly)?
3
1
Entering edit mode
6.0 years ago

I am working on an RNA-Seq project with sequencing data from Rat samples. I would appreciate some help with obtaining the latest list of Rat (Rattus-norvegecus) transcripts (FASTA file). It appears that Rat genome (Rnor_6.0 assembly) has about 41,000+ gene transcripts as opposed to over 135,000+ Mouse gene transcripts (numbers from Ensembl database). Could anyone confirm if this is the latest info’ for Rat species? I have also looked at NCBI site for Rat transcripts and I see over 69000 transcripts (in the RefSeq categories NM, NR, XM and XR).

RNA-Seq Assembly • 1.8k views
ADD COMMENT
2
Entering edit mode
6.0 years ago

Not sure about your exact question. As you say I'd expect the mouse genome to be much more exactly defined than the rat due to the number of groups working on it.

I have found the Rat Genome Database to be very good but have only been working on genomics so far, not RNA-seq.

To see the differences between different rat annotations I would strongly recommend mapping them to the genome with gmap, and / or importing them into a genome browser for visual comparison at multiple loci.

ADD COMMENT
1
Entering edit mode
6.0 years ago
GenoMax 141k

Ensembl's stats are available on this page. Actual file can be downloaded here (filter for Rat).

ADD COMMENT
1
Entering edit mode
6.0 years ago
luxeredias ▴ 10

Dear OP, I'm going through the exact same process right now!

I was also a little skeptical about the number of transcripts in the Rat transcriptome, as opposed to the much larger number in the mouse one. However, I believe colindaven's answer explains it: not so many groups use rat as a model, so less information is known on its transcriptome.

As for genomax's answer, that was also where I obtained my transcriptome to index Salmon (does this happen to be the same reason you have to need this transcriptome?). In the stats link genomax posted, it says the rat transcriptome has 41,078 transcripts. However, the "Rattus_norvegicus.Rnor_6.0.cdna.all.fa" file has only 31,715 seqs. If you look into the non-coding RNA file (here: ftp://ftp.ensembl.org/pub/release-92/fasta/rattus_norvegicus/ncrna/Rattus_norvegicus.Rnor_6.0.ncrna.fa.gz), it has 9,331 seqs. Together, cdna and ncrna files amount to 41,046, which is roughly the number of transcripts said to be present in the rat transcriptome. I believe this could be how ensemble got to the number on the stats page of the rat genome/transcriptome.

You will also find an abinitio transcript file in the ensemble ftp (ftp://ftp.ensembl.org/pub/release-92/fasta/rattus_norvegicus/cdna/Rattus_norvegicus.Rnor_6.0.cdna.abinitio.fa.gz) which has 59,821 seqs. I do not know how this file relates to the others, so if anyone could help clear that up it would be great!

All in all, I used the shorter cdna fasta (31,715 transcripts) to index Salmon, but have also built indexes using the abinitio fasta and the concatenated file between cdna and ncrna. I'll do further anlyses running salmon with each one to see where I get to.

Best,

Thomaz

ADD COMMENT

Login before adding your answer.

Traffic: 2173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6