Problem with generating UCSCs refGene format
7.1 years ago
gtho123 ▴ 220

Hi have performed a differential expression experiment using RNA-Seq and would like to have a go with chromosome clustering. I found this program CROC which seems like it could help but it requires a list of genes (which I have) and a reference genome in  UCSCs refGene format.

Since I work in on a plant with not reference refGene file to use I have tried to generate my own following advice on SeqAnswers. Essentially I used UCSCs gtfToGenePred to create a refGene file from a GTF file.

However while I can load it into CROC, when I run the clustering program it fails to identify the genes in the reference. I think there must be something wrong with my reference file as I can see the relevant gene IDS in there. How can I adjust it so that gene IDs will be picked out.

Here are the first few entries of my generated refGene file:

XLOC_000001    TCONS_00000001    chr1    +    6523    7366    7366    7366    2    6523,7097,    6620,7366,
XLOC_000002    TCONS_00000002    chr1    +    14513    15729    15729    15729    2    14513,15502,    14556,15729,
XLOC_000003    TCONS_00000003    chr1    +    16282    18382    18382    18382    3    16282,17060,18241,    16326,17304,18382,
XLOC_000004    TCONS_00000004    chr1    +    31972    32344    32344    32344    1    31972,    32344,

and I insert in Gene IDs which look like this:

XLOC_000007
XLOC_000287
XLOC_000320
XLOC_000381
XLOC_000394
XLOC_000645
XLOC_000754

I am not familiar with this file format so thought that switching the first two colunms around might help. It did not.

Any help is greatly appreciated.

software error refgene file format • 1.5k views
do your other results have the 'chr' prefix? e.g. are they '1' instead of 'chr1?

They all have the 'chr' prefix.

These are all the values that are in that column

chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
scaffold0001
scaffold0002
scaffold0003
scaffold0004
.....
scaffold2177