Parser For Converting Ucsc Tables Into Gff3 Format
1
0
Entering edit mode
10.4 years ago
ChIP ▴ 600

Hi!

I have the sample data as shown below from UCSC table browser:

#bin    name    chrom    strand    txStart    txEnd    cdsStart    cdsEnd    exonCount    exonStarts    exonEnds    score    name2    cdsStartStat    cdsEndStat    exonFrames
0    NM_032291    chr1    +    66999824    67210768    67000041    67208778    25    66999824,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755,    67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67210768,    0    SGIP1    cmpl    cmpl    0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,0,1,1,2,2,0,2,1,1,
1    NM_032785    chr1    -    48998526    50489626    48999844    50489468    14    48998526,49000561,49005313,49052675,49056504,49100164,49119008,49128823,49332862,49511255,49711441,50162984,50317067,50489434,    48999965,49000588,49005410,49052838,49056657,49100276,49119123,49128913,49332902,49511472,49711536,50163109,50317190,50489626,    0    AGBL4    cmpl    cmpl    2,2,1,0,0,2,1,1,0,2,0,1,1,0,
1    NM_018090    chr1    +    16767166    16786584    16767256    16785385    8    16767166,16770126,16774364,16774554,16775587,16778332,16782312,16785336,    16767348,16770227,16774469,16774636,16775696,16778510,16782388,16786584,    0    NECAP2    cmpl    cmpl    0,2,1,1,2,0,1,2,
1    NM_052998    chr1    +    33546713    33585995    33547850    33585783    12    33546713,33546988,33547201,33547778,33549554,33557650,33558882,33560148,33562307,33563667,33583502,33585644,    33546895,33547109,33547413,33547955,33549728,33557823,33559017,33560314,33562470,33563780,33583717,33585995,    0    ADC    cmpl    cmpl    -1,-1,-1,0,0,0,2,2,0,1,0,2,
1    NM_001145278    chr1    +    16767166    16786584    16767256    16785385    8    16767166,16770126,16774364,16774554,16775587,16778332,16782312,16785336,    16767270,16770227,16774469,16774636,16775696,16778510,16782388,16786584,    0    NECAP2    cmpl    cmpl    0,2,1,1,2,0,1,2,
1    NM_001145277    chr1    +    16767166    16786584    16767256    16785491    7    16767166,16770126,16774364,16774554,16775587,16778332,16785336,    16767348,16770227,16774469,16774636,16775696,16778510,16786584,    0    NECAP2    cmpl    cmpl    0,2,1,1,2,0,1,
1    NM_001080397    chr1    +    8384389    8404227    8384389    8404073    8    8384389,8385357,8385877,8390268,8395496,8397875,8399552,8403806,    8384786,8385450,8386102,8390996,8395650,8398052,8399758,8404227,    0    SLC45A1    cmpl    cmpl    0,1,1,1,0,1,1,0,
1    NM_013943    chr1    +    25071759    25170815    25072044    25167428    6    25071759,25124232,25140584,25153500,25166350,25167263,    25072116,25124342,25140710,25153607,25166532,25170815,    0    CLIC4    cmpl    cmpl    0,0,2,2,1,0,

I would like to convert this into GFF3 format.

Does anybody knows how it can be done, or has something which can be used...something in bash or python or perl ????

Thank you

python genome • 2.7k views
ADD COMMENT
0
Entering edit mode

Why don't you just export the ucsc table in gtf format directly?

ADD REPLY
0
Entering edit mode

Because it misses the set of information that I need, hence, I am first taking it in GenPred format :) .... and that set of information is the the gene name

ADD REPLY
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6