Question: Bulk download introns, exons, and UTR regions from Ensembl for gene prediction training set
0
gravatar for Katherine Huang
6 months ago by
Katherine Huang0 wrote:

Hi, I would like to download labeled FASTA sequences of introns, exons, 5' UTR regions, and 3' UTR regions from a nonredundant set of human genes.

Ensembl allows me to do this for an individual gene by going to a page on an individual transcript variant (https://www.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;g=ENSG00000139618;r=13:32889611-32973805;t=ENST00000544455) and clicking Download Sequence > FASTA.

Is there a way to automatically download a file like that for several thousand genes? I would like them all to be human (or at least mammalian) and protein-coding. Biomart seems to be down right now, and I'm willing to try to use the Perl, REST, or SQL APIs, but I have no experience with any of those, so some direction would be appreciated.

Ultimately I want a database of DNA sequences labeled as intron, exon, 5' UTR, or 3' UTR. If other databases (e.g. RefSeq) can provide it, that would be great too. Thanks!

ADD COMMENTlink modified 6 months ago by Brian Gudenas90 • written 6 months ago by Katherine Huang0
4
gravatar for Brian Gudenas
6 months ago by
United States
Brian Gudenas90 wrote:

Check out the biomaRt R package, specifically the getSequence function which allows you to use a list of gene identifiers (Ensembl, or entrezgene) to retrieve sequences of interest by changing the seqType parameter (cdna, 3utr, 5utr, gene_exon, gene_intron, etc..)

library(biomaRt)
mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

Ensembl_IDs = c(ENSG00000139618, ENSG00000128731)

seqs = biomaRt::getSequence(id = Ensembl_IDs, 
           type="ensembl_gene_id",
           seqType = "gene_exon", 
           mart = mart)
ADD COMMENTlink written 6 months ago by Brian Gudenas90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 545 users visited in the last hour