Question: Differences Between Reference Human Genome Assemblies From Different Sources
2
gravatar for alpha2zee
6.0 years ago by
alpha2zee100
alpha2zee100 wrote:

I am relatively new to analysis of whole transcriptome RNA sequencing data. I am planning to map human RNA sequencing reads against the reference human genome/transcriptome (i.e., generate BAM files from fastq files).

I notice that reference genome assemblies are available from a number of sources: UCSC (currently as hg19), Ensembl (currently as GRCh37.73), 1000 Genome project (currently as v37), etc. All of these releases seem to be based on Genome Research Consortium's GRCh37 release.

(1) What are the differences between such different genome assemblies?

(2) What are the differences between the different releases from Ensembl (e.g., GRCh37.70 vs .71)?

(3) For my purpose, aligning raw reads to obtain gene expression data for differential expression analysis, does it matter if one used a particular GRCh37-based reference assembly for a group of samples, and, in the future, for another group of samples used a different GRCh37-based assembly (either a different source or the same source but a different release)?

(4) Finally, can I use the reference genome assembly from one source or release and a gene annotation file from another source or release as long as they all are based on GRCh37?

Thank you.

rna-seq • 8.2k views
ADD COMMENTlink modified 16 days ago by MatthewP290 • written 6.0 years ago by alpha2zee100
3
gravatar for Pierre Lindenbaum
6.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

1) What are the differences between such different genome assemblies?

see http://plindenbaum.blogspot.fr/2013/07/g1kv37-vs-hg19.html

2) What are the differences between the different releases from Ensembl

see What's the difference between two versions of the same assembly ?

3)

For human, I would say you'd better use the data of the GATK bundle to stay close to their pipeline

4) yes but you'll' have to verify that they use the same names for the chromosomes (e.g. "chr" prefix)

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Pierre Lindenbaum124k
1
gravatar for MatthewP
16 days ago by
MatthewP290
China
MatthewP290 wrote:

https://software.broadinstitute.org/gatk/documentation/article?id=23390

ADD COMMENTlink written 16 days ago by MatthewP290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1766 users visited in the last hour