Question: Gene expression analysis on gene counts from different genome builds
0
gravatar for skylinesky
9 months ago by
skylinesky0
skylinesky0 wrote:

Hello all, I will do a differential gene expression analysis using Deseq2. However in my count data, half of the samples are aligned to reference genome using mm10 genome build, and the other half is aligned using mm9(bam files). I got their gene count data using their respective genome build. I have merged all count files and will do a differential gene expression analysis. I am wondering whether using different genome builds count file can influence the result

Thanks!

rna-seq deseq2 gene count mm10 mm9 • 275 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by skylinesky0
2

Yes of course that will influence your analysis. You should realign the mm9 data to mm10, and then use the same annotation (GTF) file for producing count files.

ADD REPLYlink written 9 months ago by Benn6.8k

Thank you for your answer, they are from same experiment, they just used two different genome builds to map reads. So I have mm9 and mm10 count files and want to analyse them together.. At first I thought even the genome annotation is different in half of the samples, overall result should be same. Maybe I can create additional factor to control batch effect on my analysis..

ADD REPLYlink written 9 months ago by skylinesky0

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment belongs under @h.mon's answer.

ADD REPLYlink written 9 months ago by genomax67k

You do have to align all reads to the same genome version, preferably mm10.

You don't need to correct for a batch effect, as there is none. I just asked because I considered odd to have part of the samples mapped to mm9, and part mapped to mm10, and I reasoned it could be due to the samples being sequenced at different times, due to being different experiments.

ADD REPLYlink modified 9 months ago • written 9 months ago by h.mon25k
0
gravatar for h.mon
9 months ago by
h.mon25k
Brazil
h.mon25k wrote:

Before answering your question:

However in my count data, half of the samples are aligned to reference genome using mm10 genome build, and the other half is aligned using mm9(bam files).

Why such situation? Are these different experiments you want to analyse together? If this is the case, you have to take into account batch effects, and depending on the experimental design, it will be impossible to untangle batch effects from your factors of interest.

Regarding your question:

The mm10 genome sequence is better (more bases and less errors) than mm9, and one generally gets more mapped reads when using mm10 as reference genome.

In addition, and more important, the annotation have changed considerably, mostly with new genes added to mm10, but also with gene models changing between versions, pseudo-genes and incorrect annotations being dropped, and some genes / transcripts changing names.

In summary, you have to map the original reads to the same reference genome to proceed with differential expression analysis.

ADD COMMENTlink written 9 months ago by h.mon25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1209 users visited in the last hour