Question: Differences between two genome releases from Ensembl?
1
gravatar for rna-seq_researcher
4.1 years ago by
Netherlands
rna-seq_researcher40 wrote:

Hello all, 

Today I was working on bowtie alignment between a human reference genome and RNA-seq data from HepG2 cells. I used RSEM to prepare the reference and create bowtie indexes, based on the release 75 from Ensembl (from Feb. 2014). I created indexes for both "all" and "rm" (masked) genome sets and aligned to my data. I was able to successfully align them, with alignment percentages of 82.36 and 69.85, respectively.

However, when I compared these results with a previous one obtained from a collegue that did the same analysis before, with the same data, but using the (masked) release 58 from Ensembl (from May 2010), I noticed that his alignment percentage was 51.26%. I repeated it with v.58 to be sure, and obtained the same percentage, which means that I'm following the correct alignment pipeline.

My question here is how different one release can be from one another. New genes and transcripts can be added to a new release, but I don't know if this is enough to make up for almost 20% variation (69.85% from release 75 and 51.26 %from release 58) on my data. Does anyone have any advice on that? 

Regards,

bowtie alignment ensembl genome • 1.2k views
ADD COMMENTlink modified 20 months ago by Biostar ♦♦ 20 • written 4.1 years ago by rna-seq_researcher40
2
gravatar for Alastair Kerr
4.1 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

If mapping to genomic then these should be the same genome build, see image: 

However what will be different is annotation of genes and assembly patches. (Perhaps mapping extra repeats as well: I am not sure but this will not change the unmasked genome).  Do the extra reads map to these patches (i.e. not 1, 2 , 3, X, etc).  This might be your answer.  

If not check which reads  do not map using the "-f" flag in "samtools view".  I would then grep the identifier  of some of these reads  to see where they map in your latest alignment.

 

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Alastair Kerr5.2k

I think Alastair is right about the patches. The genome assembly has not changed since release 55, so this is not the difference. However we have introduced many patches and haplotypes since then. There's a help video here to explain what we mean by patches and haplotypes.

ADD REPLYlink written 4.1 years ago by Emily_Ensembl15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1011 users visited in the last hour