I am working on an RNA-Seq project with sequencing data from Rat samples. I would appreciate some help with obtaining the latest list of Rat (Rattus-norvegecus) transcripts (FASTA file). It appears that Rat genome (Rnor_6.0 assembly) has about 41,000+ gene transcripts as opposed to over 135,000+ Mouse gene transcripts (numbers from Ensembl database). Could anyone confirm if this is the latest info’ for Rat species? I have also looked at NCBI site for Rat transcripts and I see over 69000 transcripts (in the RefSeq categories NM, NR, XM and XR).
Not sure about your exact question. As you say I'd expect the mouse genome to be much more exactly defined than the rat due to the number of groups working on it.
I have found the Rat Genome Database to be very good but have only been working on genomics so far, not RNA-seq.
To see the differences between different rat annotations I would strongly recommend mapping them to the genome with gmap, and / or importing them into a genome browser for visual comparison at multiple loci.
Dear OP, I'm going through the exact same process right now!
I was also a little skeptical about the number of transcripts in the Rat transcriptome, as opposed to the much larger number in the mouse one. However, I believe colindaven's answer explains it: not so many groups use rat as a model, so less information is known on its transcriptome.
As for genomax's answer, that was also where I obtained my transcriptome to index Salmon (does this happen to be the same reason you have to need this transcriptome?). In the stats link genomax posted, it says the rat transcriptome has 41,078 transcripts. However, the "Rattus_norvegicus.Rnor_6.0.cdna.all.fa" file has only 31,715 seqs. If you look into the non-coding RNA file (here: ftp://ftp.ensembl.org/pub/release-92/fasta/rattus_norvegicus/ncrna/Rattus_norvegicus.Rnor_6.0.ncrna.fa.gz), it has 9,331 seqs. Together, cdna and ncrna files amount to 41,046, which is roughly the number of transcripts said to be present in the rat transcriptome. I believe this could be how ensemble got to the number on the stats page of the rat genome/transcriptome.
You will also find an abinitio transcript file in the ensemble ftp (ftp://ftp.ensembl.org/pub/release-92/fasta/rattus_norvegicus/cdna/Rattus_norvegicus.Rnor_6.0.cdna.abinitio.fa.gz) which has 59,821 seqs. I do not know how this file relates to the others, so if anyone could help clear that up it would be great!
All in all, I used the shorter cdna fasta (31,715 transcripts) to index Salmon, but have also built indexes using the abinitio fasta and the concatenated file between cdna and ncrna. I'll do further anlyses running salmon with each one to see where I get to.