Question: Differences between two genome releases from Ensembl?
4.1 years ago
rna-seq_researcher40 wrote:

Today I was working on bowtie alignment between a human reference genome and RNA-seq data from HepG2 cells. I used RSEM to prepare the reference and create bowtie indexes, based on the release 75 from Ensembl (from Feb. 2014). I created indexes for both "all" and "rm" (masked) genome sets and aligned to my data. I was able to successfully align them, with alignment percentages of 82.36 and 69.85, respectively.

However, when I compared these results with a previous one obtained from a collegue that did the same analysis before, with the same data, but using the (masked) release 58 from Ensembl (from May 2010), I noticed that his alignment percentage was 51.26%. I repeated it with v.58 to be sure, and obtained the same percentage, which means that I'm following the correct alignment pipeline.

My question here is how different one release can be from one another. New genes and transcripts can be added to a new release, but I don't know if this is enough to make up for almost 20% variation (69.85% from release 75 and 51.26 %from release 58) on my data. Does anyone have any advice on that? 


written 4.1 years ago by rna-seq_researcher40
4.1 years ago
Alastair Kerr
The University of Edinburgh, UK
Alastair Kerr wrote:

If mapping to genomic then these should be the same genome build, see image: 

However what will be different is annotation of genes and assembly patches. (Perhaps mapping extra repeats as well: I am not sure but this will not change the unmasked genome).  Do the extra reads map to these patches (i.e. not 1, 2 , 3, X, etc).  This might be your answer.  

If not check which reads  do not map using the "-f" flag in "samtools view".  I would then grep the identifier  of some of these reads  to see where they map in your latest alignment.


written 4.1 years ago by Alastair Kerr

I think Alastair is right about the patches. The genome assembly has not changed since release 55, so this is not the difference. However we have introduced many patches and haplotypes since then. There's a help video here to explain what we mean by patches and haplotypes.

written 4.1 years ago by Emily_Ensembl
