Question: Ensembl transcript ID - BioMart retrieval
0
gravatar for alessandro_d
3.1 years ago by
alessandro_d0 wrote:

Hello everyone, I have downloaded the human transcripts using BioMart (from Ensembl website). I was just wondering why I only get 1000 sequences; I would expect much more.

Research details: Dataset: homo sapiens genes Filters: limit to Ensembl human transcript IDs (only) Attributes: Ensembl gene ID. Ensembl transcript ID, unspliced transcripts

Thank you in advance

biomart ensembl • 1.5k views
ADD COMMENTlink modified 3.1 years ago by Denise - Open Targets5.0k • written 3.1 years ago by alessandro_d0
1
gravatar for Denise - Open Targets
3.1 years ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets5.0k wrote:

I'd recommend very much the opposite: do use filters. The more the better actually. BioMart is not the tool to retrieve all the human genes (or transcripts) or their sequences or anything genomewide. Hence you've got the 'warning' max 500 advised to input your data (list of gene IDs for example). Also the only format to download the sequences is FASTA. There will not be a TSV.

The likely reason you only got 1000 sequences is the the web timed out due to the huge numbers of results it had to process. If you want to use the web interface I'd recommend you to do the query in chunks (per chromosome for example, selecting a filter under REGION). You can also try to get the results in a compressed format sent to you by email (when it's ready). That one should have all the sequences you are after and not 'just' the first 1000.

ADD COMMENTlink written 3.1 years ago by Denise - Open Targets5.0k

Hello Denise,

Do you know another tool for genome wide comparisons among few species? I had a similar problem with my compressed file from BioMart because the Microsoft Excel cannot open it correctly, so many genes do not appear in the results. I do not need the sequences, but I do need the orthologs. Thank you.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by gabrielabcg10
1

Hello @gabrielabcg, the same would apply whether you download a table with orthologs or sequences (if using BioMart). A) You should restrict the query as much as you can by applying the filters. B) You could choose the results to be sent to you by email in a compressed format. In your case, you could also use this HTTPS call to access the Ensembl REST API. You can paste that URL into a browser URL location or use this command on a terminal window. If you rather continue with BioMart and still gets a corrupted file sent to you, it may be worth contacting the Ensembl helpdesk.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Denise - Open Targets5.0k
0
gravatar for Manvendra Singh
3.1 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

Hi Dear,

I would recommend not to filter anything. Dataset: homo sapiens genes Attributes: Ensembl gene ID. Ensembl transcript ID. first get the data in tsv, and then just remove the duplicates

hth

ADD COMMENTlink written 3.1 years ago by Manvendra Singh2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1776 users visited in the last hour