Ensembl transcript ID - BioMart retrieval
2
0
Entering edit mode
7.5 years ago

Hello everyone, I have downloaded the human transcripts using BioMart (from Ensembl website). I was just wondering why I only get 1000 sequences; I would expect much more.

Research details: Dataset: homo sapiens genes Filters: limit to Ensembl human transcript IDs (only) Attributes: Ensembl gene ID. Ensembl transcript ID, unspliced transcripts

Thank you in advance

biomart Ensembl • 3.3k views
ADD COMMENT
1
Entering edit mode
7.5 years ago
Denise CS ★ 5.2k

I'd recommend very much the opposite: do use filters. The more the better actually. BioMart is not the tool to retrieve all the human genes (or transcripts) or their sequences or anything genomewide. Hence you've got the 'warning' max 500 advised to input your data (list of gene IDs for example). Also the only format to download the sequences is FASTA. There will not be a TSV.

The likely reason you only got 1000 sequences is the the web timed out due to the huge numbers of results it had to process. If you want to use the web interface I'd recommend you to do the query in chunks (per chromosome for example, selecting a filter under REGION). You can also try to get the results in a compressed format sent to you by email (when it's ready). That one should have all the sequences you are after and not 'just' the first 1000.

ADD COMMENT
0
Entering edit mode

Hello Denise,

Do you know another tool for genome wide comparisons among few species? I had a similar problem with my compressed file from BioMart because the Microsoft Excel cannot open it correctly, so many genes do not appear in the results. I do not need the sequences, but I do need the orthologs. Thank you.

ADD REPLY
1
Entering edit mode

Hello @gabrielabcg, the same would apply whether you download a table with orthologs or sequences (if using BioMart). A) You should restrict the query as much as you can by applying the filters. B) You could choose the results to be sent to you by email in a compressed format. In your case, you could also use this HTTPS call to access the Ensembl REST API. You can paste that URL into a browser URL location or use this command on a terminal window. If you rather continue with BioMart and still gets a corrupted file sent to you, it may be worth contacting the Ensembl helpdesk.

ADD REPLY
0
Entering edit mode
7.5 years ago
Manvendra Singh ★ 2.2k

Hi Dear,

I would recommend not to filter anything. Dataset: homo sapiens genes Attributes: Ensembl gene ID. Ensembl transcript ID. first get the data in tsv, and then just remove the duplicates

hth

ADD COMMENT

Login before adding your answer.

Traffic: 2176 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6