How To Find "Short" Genes
1
2
Entering edit mode
10.9 years ago
a.donizetti ▴ 30

I'm a CS student working on the Ensembl database RESTful API.

I need to send multiple requests to the database in the unit tests of my library, and it would be nice to keep the responses as small as possibile.

For example, in the documentation of sequence_id the main example uses gene with id: ENSG00000157764, wich is quite big (about 200k bp).

Question: Is there a way to find Ensembl IDs (or just the names) of genes with small sequences?

genes sequence ensembl • 2.4k views
ADD COMMENT
10
Entering edit mode
10.9 years ago

use biomart to build a query (gene_name, chrom, start,end), create a new column 'Length' with awk ,sort on length and print the first results:

$ curl -s -L  -d 'query=<Query virtualSchemaName="default" formatter="TSV" header="0" uniqueRows="0" count="" datasetConfigVersion="0.6"><Dataset name="hsapiens_gene_ensembl" interface="default"><Attribute name="ensembl_gene_id"/><Attribute name="chromosome_name"/><Attribute name="start_position"/><Attribute name="end_position"/></Dataset></Query>' "http://www.biomart.org/biomart/martservice/result" |\
awk -F '      ' '{L=int($4)-int($3); if(L>300) next; printf("%s\t%d\n",$0,L);}' |\
sort -t '   ' -k5,5n |\
head -n 20

ENSG00000223997    14    22907539    22907546    7
ENSG00000237235    14    22907999    22908007    8
ENSG00000236597    14    106331761    106331771    10
ENSG00000228985    14    22918105    22918117    12
ENSG00000262536    HSCHR22_1_CTG2    39345118    39345132    14
ENSG00000268373    12    19571846    19571860    14
ENSG00000227800    14    106360366    106360381    15
ENSG00000232543    14    106369474    106369489    15
ENSG00000233655    14    106379081    106379096    15
ENSG00000227108    14    106366496    106366512    16
ENSG00000236170    14    106385361    106385377    16
ENSG00000237020    14    106357049    106357065    16
ENSG00000237197    14    106375766    106375782    16
ENSG00000225825    14    106347397    106347414    17
ENSG00000228131    14    106376269    106376286    17
ENSG00000227196    14    106350728    106350746    18
ENSG00000211907    14    106346892    106346911    19
ENSG00000211909    14    106349761    106349780    19
ENSG00000211915    14    106359400    106359419    19
ENSG00000211928    14    106378116    106378135    19
ADD COMMENT
1
Entering edit mode

This is nice. Thank you Pierre!

ADD REPLY

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6