Question: Repeat masked gtf files from ensembl
0
gravatar for kevin.stachelek
9 months ago by
kevin.stachelek10 wrote:

I am looking for a repeat masked version of a reference genome from ensembl. I find information posted here that suggests accessing via ftp. I can't find figure out what subdirectory it might be hiding in. Searching via command-line didn't turn anything up.

I could run repeat-masker myself but as I remember setup is pretty involved (you need to download a repeat library, etc.)

I can post in some ensembl forum if that's more appropriate.

thanks,

rna-seq ensembl genome • 517 views
ADD COMMENTlink modified 9 months ago by h.mon31k • written 9 months ago by kevin.stachelek10
1

Repeat masked gtf files from ensembl

Do you actually want a GFF file with regions of the genome that are repeat masked or a repeat masked genome sequence (answer provided by @h.mon below).

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax91k

Hi, yes the gff (or gtf) file is what I'm looking for. Something analogous to this file from UCSC.

ADD REPLYlink modified 9 months ago • written 9 months ago by kevin.stachelek10

Do you mean a GTF or GFF containing the repeats, or just the genes as annotated on repeat-masked sequence?

ADD REPLYlink written 9 months ago by Emily_Ensembl21k

Thanks for helping me clarify. I'm looking for a gtf file containing the locations of all masked regions in the ensembl reference genome. It might look like this:

chr1    hg38_rmsk   exon    67108754    67109046    1892.000000 +.  gene_id "L1P5"; transcript_id "L1P5";
ADD REPLYlink modified 9 months ago by genomax91k • written 9 months ago by kevin.stachelek10

All the gene annotation in Ensembl is done after repeat masking. There are folders of GFF3 and GTF files on the Ensembl FTP site.

ADD REPLYlink written 9 months ago by Emily_Ensembl21k

Sorry, it appears to me that these gtf files are for the gene annotation in ensembl rather than an annotation of the masked regions (types of transposable elements, repeats, etc.)

Am I misunderstanding?

ADD REPLYlink written 9 months ago by kevin.stachelek10

Yes, they are. I was still quite confused by your comment as it said you were after the repeats, but then the line you gave was an exon line, which would suggest you want the genes.

ADD REPLYlink written 9 months ago by Emily_Ensembl21k

Ah, I see. think the 'exon' annotation be a confusing shortcut; the actual annotated feature is a LINE-1 element

ADD REPLYlink written 9 months ago by kevin.stachelek10
1
gravatar for h.mon
9 months ago by
h.mon31k
Brazil
h.mon31k wrote:

The easiest way is to go to the organism Ensembl page, e.g. https://www.ensembl.org/Acanthochromis_polyacanthus/Info/Index for the spiny chromis. There, you will find a "Download DNA sequence (FASTA)" link which will take you directly to the ftp folder containing the complete genome, soft-masked genome, and hard-masked genome.

ADD COMMENTlink written 9 months ago by h.mon31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour