How To Find "Short" Genes
1
I'm a CS student working on the Ensembl database RESTful API.
I need to send multiple requests to the database in the unit tests of my library, and it would be nice to keep the responses as small as possibile.
For example, in the documentation of sequence_id the main example uses gene with id: ENSG00000157764, wich is quite big (about 200k bp).
Question: Is there a way to find Ensembl IDs (or just the names) of genes with small sequences?
genes
sequence
ensembl
• 2.4k views
use biomart to build a query (gene_name, chrom, start,end), create a new column 'Length' with awk ,sort on length and print the first results:
$ curl -s -L -d 'query=<Query virtualSchemaName="default" formatter="TSV" header="0" uniqueRows="0" count="" datasetConfigVersion="0.6"><Dataset name="hsapiens_gene_ensembl" interface="default"><Attribute name="ensembl_gene_id"/><Attribute name="chromosome_name"/><Attribute name="start_position"/><Attribute name="end_position"/></Dataset></Query>' "http://www.biomart.org/biomart/martservice/result" |\
awk -F ' ' '{L=int($4)-int($3); if(L>300) next; printf("%s\t%d\n",$0,L);}' |\
sort -t ' ' -k5,5n |\
head -n 20
ENSG00000223997 14 22907539 22907546 7
ENSG00000237235 14 22907999 22908007 8
ENSG00000236597 14 106331761 106331771 10
ENSG00000228985 14 22918105 22918117 12
ENSG00000262536 HSCHR22_1_CTG2 39345118 39345132 14
ENSG00000268373 12 19571846 19571860 14
ENSG00000227800 14 106360366 106360381 15
ENSG00000232543 14 106369474 106369489 15
ENSG00000233655 14 106379081 106379096 15
ENSG00000227108 14 106366496 106366512 16
ENSG00000236170 14 106385361 106385377 16
ENSG00000237020 14 106357049 106357065 16
ENSG00000237197 14 106375766 106375782 16
ENSG00000225825 14 106347397 106347414 17
ENSG00000228131 14 106376269 106376286 17
ENSG00000227196 14 106350728 106350746 18
ENSG00000211907 14 106346892 106346911 19
ENSG00000211909 14 106349761 106349780 19
ENSG00000211915 14 106359400 106359419 19
ENSG00000211928 14 106378116 106378135 19
Login before adding your answer.
Traffic: 1709 users visited in the last hour
This is nice. Thank you Pierre!