NCBI vs ENSEMBL FASTA
1
0
Entering edit mode
6.9 years ago
uki_al ▴ 50

Hi I have a question about the differences between the FASTA files that can be downloaded from the ensembl ftp (ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz) and the ncbi ftp (ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_genomic.fna.gz).

As far as I could get tell, both are GRCH37 versions, so I was curious are the references identical or not? If they are, could I use the FASTA file downloaded from the ensembl ftp together with the gene-annotation file downloaded from ncbi ftp?

I know UCSC differs by chromosome naming, and I know there are tools that can convert from one to another, that's why I opt to download UCSC FASTA and GTF and use them together. I was also using up until now the ensembl FASTA and GTF together. But I was just curious, if I want to use ncbi GTF, do I need to download the FASTA from the ncbi ftp, or will the ensembl one do the job? From what I understood, they should be identical, I just couldn't confirm this...

genome sequence fasta ensembl refseq • 3.4k views
ADD COMMENT
1
Entering edit mode
6.9 years ago

The assembly may be the same although they could differ due to the differential application of patches. Regardless, the annotations would definitely be different between the different resources as they each annotate the genome in their own way. Switching between or mixing references during a project is asking for trouble.

ADD COMMENT
2
Entering edit mode

For reference: chromosome coordinates remain unchanged by patches.

ADD REPLY
0
Entering edit mode

Thanks, I wasn't sure about that.

ADD REPLY

Login before adding your answer.

Traffic: 2931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6