Question: Biomart Mrna Count Much Bigger When Pulling By Xml Query Than Using Webpage
Hi All,

I downloaded mRNA data for human

<Query virtualSchemaName="default" formatter="FASTA" header="0" uniqueRows="1" count="" datasetConfigVersion="0.8">

<Dataset name = "hsapiens_gene_ensembl" interface = "default">
    <Filter name = "status" value = "KNOWN"/>
    <Filter name = "transcript_status" value = "KNOWN"/>
    <Filter name = "biotype" value = "protein_coding"/>
    <Attribute name = "ensembl_transcript_id"/>
    <Attribute name = "cdna"/>


When I check it on the count shows 20467 but the file I am getting is huge and the count goes over 100l+. I have tried playing with datasetConfigVersion = "0.8" setting it to 0.6, 0.7 and 0.8 and always the same. Why am I getting so many sequences with sml query? Wven when I do not use status transcript status and biotype only the total number of genes with cDNA sequence is about 50k. Also I keep getting MySQL server errors with lost connection error. Busy server? Thanks.


mrna ensembl biomart sequence human • 2.0k views
Edinburgh Genomics, The University of Edinburgh
Tom, the [Count] button in BioMart gives you the number of filtered genes, but you are exporting transcripts. Many genes have multiple transcripts annotated, hence the discrepancy in numbers you are observing.

