Question: What's the difference between "hg19" and "hg19 v19"
16 months ago
yliueagle220 wrote:

In ENCODE, sometimes there are choices of "hg19" and "hg19 v19" when downloading the aligned RNA data. Is there a big difference between these versions? (See here as an example

While in GEO, most of the descriptions are like "The reads were filtered, trimmed, and aligned in the UCSC reference human genome 19 (hg19)". I am wondering "hg19" is equivalent to "hg19 v19" (See here as an example of GEO:

I would say that v19 in the version of the gencode annotation

Thanks. Will there be a big difference if my focus is on gene expression analysis, between choosing UCSC hg19 and gencode v19?

No, there should be no difference in the results. You can read a bit about Ensembl and GENCODE here.

There are however, differences in the formatting of some files. I am not sure which RNA-seq pipeline you will be using, but for example, in Salmon, you would want to consider using the flag --gencode during index generation if you choose the GENCODE reference.

I disagree. Given that GENCODE contains more genes than UCSC, you perform more comparisons during differential testing and therefore the FDR-adjusted p-values might change. The difference might be limited but stating there was no difference is imho incorrect.

Hi ATpoint

You are absolutely correct. Apologies for the confusion. For some reason, I miss-interpreted the question as asking if there would be a difference between gencode and ensembl, which I wouldn't expect much. For UCSC, this is indeed true.

[Note: in retrospect, I have no idea what the source of my confusion was :) ]. Again, thanks for the correction.

