Question: How To Get Introns Positions Of A Refseq Protein With Python
5
gravatar for Dror
9.7 years ago by
Dror280
Israel
Dror280 wrote:

I have a list of Refseqs Ids and I want to get the introns position, relative to the protein sequence. Does any one have a python script to grab the introns from the genomic reference of a refseq gene, and get their position in the protein?

ADD COMMENTlink modified 9.2 years ago by Pierre Lindenbaum127k • written 9.7 years ago by Dror280

I might have a solution for this. Can you provide some of your RefSeq IDs to test it on?

ADD REPLYlink written 9.7 years ago by Michael Schubert6.9k
7
gravatar for Pierre Lindenbaum
9.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

The UCSC has already computed this table: see refGene.txt.gz, refGene.sql, here.

The table contains the postion of the exons separated by a comma, you then "just have to" reconstruct the sequence of protein from the reference sequences (here)

curl -s  "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz" | gunzip -c | head

971 NR_024227   chr19   -   50595745    50595866    50595866    50595866    1   50595745,   50595866,   0   SNAR-A6 unk unk -1,
971 NR_024227   chr19   -   50601082    50601203    50601203    50601203    1   50601082,   50601203,   0   SNAR-A6 unk unk -1,
629 NM_001014809    chr4    -   5822491 5894785 5823486 5894696 14  5822491,5827220,5830215,5837641,5838491,5841248,5843034,5844819,5851118,5853134,5857869,5862752,5868394,5894315,    5823578,5827386,5830395,5837812,5838633,5841405,5843155,5844888,5851199,5853196,5858034,5862937,5868483,5894785,    0   CRMP1   cmpl    cmpl    1,0,0,0,2,1,0,0,0,1,1,2,0,0,
808 NM_001029883    chr2    -   29284557    29297127    29287734    29297127    2   29284557,29293459,  29287933,29297127,  0   C2orf71cmpl cmpl    2,0,
705 NM_024329   chr1    +   15736390    15756839    15736467    15755220    4   15736390,15752366,15753645,15755088,    15736775,15752514,15753780,15756839,    0   EFHD2   cmpl    cmpl    0,2,0,0,
768 NM_024328   chr14   +   24025197    24028786    24025966    24028049    2   24025197,24027903,  24026513,24028786,  0   THTPA   cmpl    cmpl    0,1,
1379    NM_024326   chr10   +   104179570   104182893   104180886   104182750   4   104179570,104181110,104181543,104182560,    104180939,104181264,104182049,104182893,    0   FBXL15  cmpl    cmpl    0,2,0,2,
826 NM_138275   chr6    +   31691160    31692850    31691160    31692850    4   31691160,31691415,31692541,31692746,    31691221,31691763,31692621,31692850,    0   C6orf25 cmpl    incmpl  0,1,1,0,
609 NM_138275   chr6_cox_hap2   +   3200777 3202467 3200777 3202467 4   3200777,3201032,3202158,3202363,    3200838,3201380,3202238,3202467,    0   C6orf25 cmpl    incmpl  0,1,1,0,
607 NM_138275   chr6_dbb_hap3   +   2976730 2978420 2976730 2978420 4   2976730,2976985,2978111,2978316,    2976791,2977333,2978191,2978420,    0   C6orf25 cmpl    incmpl  0,1,1,0,
ADD COMMENTlink modified 18 months ago by RamRS26k • written 9.7 years ago by Pierre Lindenbaum127k

Yes but this contains only a fraction of the refseq data - I need the ability to do it for any organism in refseq, such as cnidarians and trichoplax

ADD REPLYlink written 9.7 years ago by Dror280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2287 users visited in the last hour