Question

Differences between two genome releases from Ensembl?

1

Entering edit mode

9.9 years ago

rna-seq_researcher ▴ 60

Hello all,

Today I was working on bowtie alignment between a human reference genome and RNA-seq data from HepG2 cells. I used RSEM to prepare the reference and create bowtie indexes, based on the release 75 from Ensembl (from Feb. 2014). I created indexes for both "all" and "rm" (masked) genome sets and aligned to my data. I was able to successfully align them, with alignment percentages of 82.36 and 69.85, respectively.

However, when I compared these results with a previous one obtained from a collegue that did the same analysis before, with the same data, but using the (masked) release 58 from Ensembl (from May 2010), I noticed that his alignment percentage was 51.26%. I repeated it with v.58 to be sure, and obtained the same percentage, which means that I'm following the correct alignment pipeline.

My question here is how different one release can be from one another. New genes and transcripts can be added to a new release, but I don't know if this is enough to make up for almost 20% variation (69.85% from release 75 and 51.26 %from release 58) on my data. Does anyone have any advice on that?

Regards,

bowtie alignment genome ensembl • 2.2k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 9.9 years ago by rna-seq_researcher ▴ 60

Ram · Answer 1 · 2014-05-27

2

Entering edit mode

9.9 years ago

Alastair Kerr 5.3k

If mapping to genomic then these should be the same genome build, see image:

However what will be different is annotation of genes and assembly patches. (Perhaps mapping extra repeats as well: I am not sure but this will not change the unmasked genome). Do the extra reads map to these patches (i.e. not 1, 2 , 3, X, etc). This might be your answer.

If not check which reads do not map using the -f flag in samtools view. I would then grep the identifier of some of these reads to see where they map in your latest alignment.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Alastair Kerr 5.3k

0

Entering edit mode

I think Alastair is right about the patches. The genome assembly has not changed since release 55, so this is not the difference. However we have introduced many patches and haplotypes since then. There's a help video here to explain what we mean by patches and haplotypes.

ADD REPLY • link 9.9 years ago by Emily 23k