Question: What exactly is included in the mRNA reference set (GRCh38/hg38).
3.2 years ago
Hi all,

I mapped my reads to the human transcriptome that I downloaded the from ucsc: > Downloads > Genome Data > Human > Full data set > mrna.fa.gz

This is the description of the file on the website: “mrna.fa.gz - Human mRNA from GenBank. This sequence data is updated once a week via automatic GenBank updates.”

As I understand this, the mRNA is extracted, reverse complimented to cDNA, sequenced and then successfully mapped sequences are stored in this mrna.ref file.

What I would like to know is the following: Is ribosomal rna in this ref? Is long non conding rna in this ref? What kind of RNA does mRNA.ref not cover? Is mrna.ref not the transcriptome reference? because ribosomal RNA is in the transcriptome but not the mRNA, right? If mRNA.ref is not the most accurate reference to use, do you guys know what is?

Thank you, -Bjarki

3.2 years ago
Do you have a good reason to map to the transcriptome? More commonly RNA-seq is mapped to the entire genome using a spliced read aligner.

Yes. I'm interested in the reads that do not get mapped to the transcriptome. So I extract the unmapped reads from the resulting bam file after mapping to the transcriptome.

