Question

Penncnv: Finding Overlapping Genes

0

Entering edit mode

13.5 years ago

romsen ▴ 70

Hello,

I'm trying to find overlapping genes for my CNV calls. I downloaded the gene annotations (hg18 (Mar2006, NCBI build 36)) from UCSC:

[knownGene.txt.gz]
[kgXref.txt.gz]

and the same for refGene annotation explained on PennCNV website.

But when I run the 'scan_region.pl' command an error occurs:

    C:\penncnv>scan_region.pl sample.rawcnv hg18_refGene.txt -refgene -reflink hg18_refLink.txt > sample.cnv.rg18
    Error: invalid record in template-location-file hg18_refGene.txt (expecting 16 or 10 tab-delimited fields in refGene file): <1410,2804,5917067  N525,1506,525,15824069132,140691R_02,,  873     7974,   215506,5254,2,1,,       218281,,
    238422,,        23-1,6,525,05784525,1392,       6913282406918345,,      87372251586LIS995,      37974586,1,-    85544155       8,   0 CEP68,2,30,15,2061314048,88390,33,0,21066480,21066480,21066480488390,33,,,291384439717  -8,,2106335,883909781,,2913XR1   9717    4695,210664805392OC1924750493576081593549121593> 
at C:\penncnv\scan_region.pl line 540 main::scanUCSCGene('sample.rawcnv', 'hg18_refGene.txt', 0, 'refgene', undef, undef) called at C:\penncnv\scan_region.pl line 108

Something seems to be broken in the annotation file. How can I avoid or fix this? I'm a biologist, not a computer scientist, so please be kind.;)

Thank you

genes cnv • 2.9k views

ADD COMMENT • link updated 10.0 years ago by Biostar 20 • written 13.5 years ago by romsen ▴ 70

0

Entering edit mode

Can you show how hg18_refGene.txt looks?

ADD REPLY • link 13.5 years ago by Niek De Klein ★ 2.6k

0

Entering edit mode

It's a tab-delimited txt file. When I open it in excel there are 16. columns. But from line 900 the format seems to be destroyed. Therefore I think I found the problem suspecting the extraction of the .gz archive!?

Update: Yes, extraction problems with powerarchiver. Using winrar let it works!

ADD REPLY • link 13.5 years ago by romsen ▴ 70

1

Entering edit mode

You should put that in the answer and then accept it, in case someone else has the same problem.

ADD REPLY • link 13.5 years ago by Niek De Klein ★ 2.6k

score 1 · Answer 1 · 2012-05-03

1

Entering edit mode

13.5 years ago

romsen ▴ 70

It's a tab-delimited txt file. When I open it in excel there are 16. columns. But from line 900 the format seems to be destroyed. Therefore I think I found the problem suspecting the extraction of the .gz archive!?

Update: Yes, extraction problems with powerarchiver. Using winrar let it works!

ADD COMMENT • link 13.5 years ago by romsen ▴ 70