Missing transcripts (ENST numbers) in gff3 files from the ENSEMBL ftp site
2
1
Entering edit mode
5.2 years ago
goldberg.jm ▴ 90

There are ENSEMBL transcripts, for example ENST00000454382, that are findable in searches of https://useast.ensembl.org/Homo_sapiens/Info/Index, but that are not present in the provided files Homo_sapiens.GRCh37.87.gff3 or Homo_sapiens.GRCh38.95.gff3 from their ftp site

ftp://ftp.ensembl.org/pub/release-95/gff3/homo_sapiens

Shouldn't the transcript be present in the big gff3 file if it is findable on the web page? Does anyone know what's up with this?

Thank you!

annotation ENSEMBL gff3 transcript IDs • 1.3k views
ADD COMMENT
2
Entering edit mode

Yes indeed, a gene model for the transcript in question, ENST00000454382, is in the gff3 "patch" file available at the ENSEMBL ftp site: ftp://ftp.ensembl.org/pub/release-95/gff3/homo_sapiens/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gff3.gz.

Thank you Ben!

ADD REPLY
4
Entering edit mode
5.2 years ago
Ben_Ensembl ★ 2.4k

Hi goldberg.jm,

The transcript you gave as an example is located on an assembly exception (see 'location' in the link below): https://www.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000232960;r=CHR_HSCHR6_MHC_MANN_CTG1:31705518-31709524;t=ENST00000454382

The FTP directory contains a number of GTF files which each contain sets of features depending on their location: http://ftp.ensembl.org/pub/current_gtf/homo_sapiens/

Here is a quick description of the contents of the different GTF files:

.gtf: This is the default file, it should contain the full annotation for all species except human and mouse. For human and mouse, it will contain all annotation on the primary assembly, ie excluding patch and haplotype regions. All species have one.

.chr.gtf: Contains only annotation on chromosomes, so toplevel scaffolds are excluded (patch and haplotypes are not included).

.chr_patch_hapl_scaff: Contains all annotation on all toplevel sequences, including patch and haplotype regions. It should only exist for human and mouse

Best wishes

Ben Ensembl Helpdesk

ADD COMMENT
0
Entering edit mode
5.2 years ago
goldberg.jm ▴ 90

Yes indeed, a gene model for the transcript in question, ENST00000454382, is in the "patch" gff3 file available at the ENSEMBL ftp site: ftp://ftp.ensembl.org/pub/release-95/gff3/homo_sapiens/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gff3.gz.

Thank you Ben!

ADD COMMENT
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment belongs under @Ben's answer.

SUBMIT ANSWER should be used only for new answers to original question.

ADD REPLY

Login before adding your answer.

Traffic: 2056 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6