Question: Missing transcripts (ENST numbers) in gff3 files from the ENSEMBL ftp site
0
gravatar for goldberg.jm
14 months ago by
goldberg.jm80
United States
goldberg.jm80 wrote:

There are ENSEMBL transcripts, for example ENST00000454382, that are findable in searches of https://useast.ensembl.org/Homo_sapiens/Info/Index, but that are not present in the provided files Homo_sapiens.GRCh37.87.gff3 or Homo_sapiens.GRCh38.95.gff3 from their ftp site

ftp://ftp.ensembl.org/pub/release-95/gff3/homo_sapiens

Shouldn't the transcript be present in the big gff3 file if it is findable on the web page? Does anyone know what's up with this?

Thank you!

ADD COMMENTlink modified 14 months ago • written 14 months ago by goldberg.jm80
2

Yes indeed, a gene model for the transcript in question, ENST00000454382, is in the gff3 "patch" file available at the ENSEMBL ftp site: ftp://ftp.ensembl.org/pub/release-95/gff3/homo_sapiens/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gff3.gz.

Thank you Ben!

ADD REPLYlink modified 14 months ago • written 14 months ago by goldberg.jm80
4
gravatar for Ben_Ensembl
14 months ago by
Ben_Ensembl1.3k
EMBL-EBI
Ben_Ensembl1.3k wrote:

Hi goldberg.jm,

The transcript you gave as an example is located on an assembly exception (see 'location' in the link below): https://www.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000232960;r=CHR_HSCHR6_MHC_MANN_CTG1:31705518-31709524;t=ENST00000454382

The FTP directory contains a number of GTF files which each contain sets of features depending on their location: http://ftp.ensembl.org/pub/current_gtf/homo_sapiens/

Here is a quick description of the contents of the different GTF files:

.gtf: This is the default file, it should contain the full annotation for all species except human and mouse. For human and mouse, it will contain all annotation on the primary assembly, ie excluding patch and haplotype regions. All species have one.

.chr.gtf: Contains only annotation on chromosomes, so toplevel scaffolds are excluded (patch and haplotypes are not included).

.chr_patch_hapl_scaff: Contains all annotation on all toplevel sequences, including patch and haplotype regions. It should only exist for human and mouse

Best wishes

Ben Ensembl Helpdesk

ADD COMMENTlink written 14 months ago by Ben_Ensembl1.3k
0
gravatar for goldberg.jm
14 months ago by
goldberg.jm80
United States
goldberg.jm80 wrote:

Yes indeed, a gene model for the transcript in question, ENST00000454382, is in the "patch" gff3 file available at the ENSEMBL ftp site: ftp://ftp.ensembl.org/pub/release-95/gff3/homo_sapiens/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gff3.gz.

Thank you Ben!

ADD COMMENTlink written 14 months ago by goldberg.jm80

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment belongs under @Ben's answer.

SUBMIT ANSWER should be used only for new answers to original question.

ADD REPLYlink written 14 months ago by genomax80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1929 users visited in the last hour