RNASEQ - Mapping to genome or Transcriptome
2
1
Entering edit mode
10 months ago
esimonova.me ▴ 20

What are the advantages and disadvantages of mapping to genome or transcriptome? Is there a good quality transcriptome available for Macaca Mulata and human?

The advantage of mapping to transcriptome is definitely the time, it takes more time to map reads to the genome. I also head that for some species a good quality transcriptome is not available so it is preferred to map to genome and the other way round. I could not find any comparison paper where the same reads were mapped to genome and transcriptome.

I found the same questions on SeqAnswers, though they out of date (year 2010 and 2014).

mapping RNAseq • 1.0k views
6
Entering edit mode
10 months ago
GenoMax 117k

It depends on what your interest is.

If you are happy with the transcriptomes available then you can use them with following caveats (besides points you already mention above): A. you are not going to be able to identify new transcripts B. there is some chance that reads may align in regions that they may not have originated from. On plus side: you can use programs like salmon (that don't need to align the data) to speed the process up while requiring significantly less hardware resources. Using salmon with genome decoys will help avoid stray matches (ref: How does salmon deal with decoy? ).

Human transcriptome is reasonably well characterized at this time. If you are not interested in alternately spliced transcripts then there is RefSeq select and MANE (LINK) datasets that you can use. I am not sure what the status is for Macaca.

0
Entering edit mode

Some additional advantages of using the transcriptome (besides speed):

• Avoiding spurious mapping to intergenic regions (assuming, of course, none of your reads are supposed to originate from such regions) so a transcriptome mapping might be better if you know that you have minimal intronic/intergenic reads
• Getting transcript-level expression estimates (which, in addition to giving you isoform resolution, has been shown to yield more accurate gene-level quantifications)
0
Entering edit mode

Depending on the quality of the macaque transcriptome, one option may be de novo assembly of both H. sapiens and M. mulata. Annotate both with the human proteome, and quantify with Salmon as mentioned.

1
Entering edit mode
10 months ago
Alex ▴ 20

The STAR Aligner actually has a parameter to map to both, simultaneously, and generate the two relevant BAM files. Not sure if you've used STAR before, but it's pretty quick since it loads the entire genome index into working memory. Only downside is you need enough memory, 50GB plus, so better run on a HPC. It also will generate junction files per sample you run, and has a two-pass method which then takes said files and uses them to identify novel transcripts.