Which reference transcriptome/genome to use for mus musculus if I know the particular strain involved in the experiment?
15 months ago
e.r.zakiev ▴ 210

I know my samples are from C57BL6.

Should I use the specific for C57BL6 or just generic mus musculus reference transcriptomes/genomes for alignments?

I am worried that the file size for the reference transcriptome for C57BL6 (Mus_musculus_c57bl6nj.C57BL_6NJ_v1.cdna.all.fa.gz, 39.7 MB) is 22% smaller than its generic counterpart (Mus_musculus.GRCm39.cdna.all.fa.gz, 51.2 MB). Clearly, biologically the C57BL6 transcriptome cannot be 22% smaller than the transcriptome of some other strain, so there is something going on with the lower detalization for C57BL6??

What do I gain, what do I lose if I opt for C57BL6-specific transcriptome/genome?

15 months ago
LChart 3.9k

There's some basic literature on the topic here: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010552 . Slightly improved uniquely mapped reads; though mapping parameters appear to have a stronger impact than choice of reference.

However, for differential expression (BL6/J untreaded vs BL6/J treated) the question isn't so much "how do the quantifications change" but "how do the logFCs change" -- and I don't see published results on this. I should imagine that, by aligning to the strain transcription, some genes might get slightly higher coverage to bump them over the soft-filtering threshold; but few (if any) logFC values should alter based on the reference.

It's probably worth doing twice just to put your mind at ease.


