Question: Biomart Mrna Count Much Bigger When Pulling By Xml Query Than Using Webpage
gravatar for Tom
8.5 years ago by
Tom0 wrote:

Hi All,

I downloaded mRNA data for human

<Query virtualSchemaName="default" formatter="FASTA" header="0" uniqueRows="1" count="" datasetConfigVersion="0.8">

<Dataset name = "hsapiens_gene_ensembl" interface = "default">
    <Filter name = "status" value = "KNOWN"/>
    <Filter name = "transcript_status" value = "KNOWN"/>
    <Filter name = "biotype" value = "protein_coding"/>
    <Attribute name = "ensembl_transcript_id"/>
    <Attribute name = "cdna"/>


When I check it on the count shows 20467 but the file I am getting is huge and the count goes over 100l+. I have tried playing with datasetConfigVersion = "0.8" setting it to 0.6, 0.7 and 0.8 and always the same. Why am I getting so many sequences with sml query? Wven when I do not use status transcript status and biotype only the total number of genes with cDNA sequence is about 50k. Also I keep getting MySQL server errors with lost connection error. Busy server? Thanks.


mrna ensembl biomart sequence human • 2.0k views
ADD COMMENTlink written 8.5 years ago by Tom0
gravatar for Bert Overduin
8.5 years ago by
Bert Overduin3.7k
Edinburgh Genomics, The University of Edinburgh
Bert Overduin3.7k wrote:

Tom, the [Count] button in BioMart gives you the number of filtered genes, but you are exporting transcripts. Many genes have multiple transcripts annotated, hence the discrepancy in numbers you are observing.

ADD COMMENTlink written 8.5 years ago by Bert Overduin3.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1710 users visited in the last hour