cummeRbund error in loading cuffdiff gene file
1
0
Entering edit mode
9.7 years ago
Adrian Pelin ★ 2.6k

Hello,

I am trying to load my cuffdiff genes tracking file into cummeRbund and I get an error:

> refCuffdiff="/shared/AdrianP/home112013/glomus/RNA-Meiosis/JGI_annot/diff-cross-solo"
> gtfFilePath="/shared/AdrianP/home112013/glomus/RNA-Meiosis/JGI_annot/merged/merged.gtf"

> genomePath="/shared/AdrianP/home112013/glomus/RNA-Meiosis/JGI_annot/Gloin1_AssemblyScaffolds_Repeatmasked.fa"
> cuff <- readCufflinks(dir=refCuffdiff,rebuild=T,gtfFile=gtfFilePath,genome=genomePath)
Creating database /shared/AdrianP/home112013/glomus/RNA-Meiosis/JGI_annot/diff-cross-solo/cuffData.db
Reading Run Info File /shared/AdrianP/home112013/glomus/RNA-Meiosis/JGI_annot/diff-cross-solo/run.info
Writing runInfo Table
Reading Read Group Info  /shared/AdrianP/home112013/glomus/RNA-Meiosis/JGI_annot/diff-cross-solo/read_groups.info
Writing replicates Table
Reading GTF file
Writing GTF features to 'features' table...
Reading /shared/AdrianP/home112013/glomus/RNA-Meiosis/JGI_annot/diff-cross-solo/genes.fpkm_tracking
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
  line 1 did not have 17 elements
>

I did not edit the genes.fpkm_tracking, here is how it looks after head:

tracking_id    class_code    nearest_ref_id    gene_id    gene_short_name    tss_id    locus    length    coverage    Crossing_FPKM    Crossing_conf_lo    Crossing_conf_hi    Crossing_status    Solo_FPKM    Solo_conf_lo    Solo_conf_hi    Solo_status
XLOC_000001    -    -    XLOC_000001    fgenesh1_kg.1_#_2_#_remain_c6340,fgenesh1_kg.1_#_3_#_step3_c2667,fgenesh1_pg.1_#_1,gm1.1_g    TSS1,TSS2,TSS3    scaffold_1:622-2570    -    -    21.9001    0    51.4671    OK    19.3357    0    51.4078    OK
XLOC_000002    -    -    XLOC_000002    gm1.12_g    TSS4    scaffold_10:10-512    -    -    0    0    0    OK    0    0    0    OK
XLOC_000003    -    -    XLOC_000003    gm1.154_g    TSS5    scaffold_100:720-1182    -    -    0.0767179    0    0.39657    OK    0    0    0    OK
XLOC_000004    -    -    XLOC_000004    gm1.153_g    TSS6    scaffold_100:21-396    -    -    0    0    0    OK    0    0    0    OK
XLOC_000005    -    -    XLOC_000005    fgenesh1_pg.1000_#_1,gm1.1557_g    TSS7,TSS8    scaffold_1000:0-2489    -    -    2.43942    0    8.08898    OK    2.15353    0    6.67501    OK
XLOC_000006    -    -    XLOC_000006    gm1.1558_g    TSS9    scaffold_1000:3054-3550    -    -    2.65628    0    8.96984    OK    5.06294    0    16.4288    OK
XLOC_000007    -    -    XLOC_000007    MIX1_1_43,e_gw1.10000.1.1,gm1.14515_g    TSS10    scaffold_10000:64-1868    -    -    0    0    0    OK    0.0292999    0    0.34366    OK
XLOC_000008    -    -    XLOC_000008    gm1.14516_g    TSS11,TSS12    scaffold_10001:164-1237    -    -    0    0    0    OK    0    0    0    OK
XLOC_000009    -    -    XLOC_000009    CE19_2291,CE23_3268,CE26_2185,CE29_2185,CE31_910,CE32_2410,fgenesh1_kg.10002_#_1_#_remain_c4528,fgenesh1_kg.10002_#_2_#_ACTTGA_L001_R1_(paired)_contig_254,fgenesh1_kg.10002_#_3_#_step3_c2559,fgenesh1_pg.10002_#_1    TSS13,TSS14,TSS15,TSS16,TSS17    scaffold_10002:1191-5743    -    -    205.934    0    427.865    OK    137.679    0    294.519    OK
XLOC_000010    -    -    XLOC_000010    fgenesh1_kg.10002_#_4_#_ACTTGA_L001_R1_(paired)_contig_4434    TSS18,TSS19,TSS20    scaffold_10002:1191-5743    -    -    89.4953    0    389.998    OK    125.531    0    422.435    OK
XLOC_000011    -    -    XLOC_000011    CE37_98,gm1.14519_g    TSS21,TSS22,TSS23    scaffold_10003:56-3724    -    -    3.44577    0    9.32412    OK    4.41548    0    12.7158    OK
XLOC_000012    -    -    XLOC_000012    gm1.14520_g    TSS24    scaffold_10004:23-239    -    -    0.726845    0    1.9669    OK    1.33937    0    3.53072    OK
XLOC_000013    -    -    XLOC_000013    fgenesh1_pg.10005_#_1,gm1.14521_g    TSS25    scaffold_10005:192-1657    -    -    0    0    0    OK    0    0    0    OK
XLOC_000014    -    -    XLOC_000014    gm1.14522_g    TSS26    scaffold_10006:251-777    -    -    0    0    0    OK    0    0    0    OK
XLOC_000015    -    -    XLOC_000015    gm1.14523_g    TSS27    scaffold_10007:900-1003    -    -    0    0    0    OK    0    0    0    OK
XLOC_000016    -    -    XLOC_000016    gm1.14524_g    TSS28    scaffold_10008:25-990    -    -    0    0    0    OK    0    0    0    OK
XLOC_000017    -    -    XLOC_000017    fgenesh1_pg.10009_#_1,gm1.14525_g,gm1.14526_g    TSS29,TSS30,TSS31    scaffold_10009:1648-3030    -    -    0    0    0    OK    0    0    0    OK
XLOC_000018    -    -    XLOC_000018    CE52_327,fgenesh1_kg.1001_#_1_#_ACTTGA_L001_R1_(paired)_contig_2142,fgenesh1_kg.1001_#_2_#_remain_c11121    TSS32,TSS33,TSS34,TSS35    scaffold_1001:464-2744    -    -    50.3672    0    150.486    OK    23.201    0    63.7491    OK
XLOC_000019    -    -    XLOC_000019    CE56_391,fgenesh1_kg.10010_#_1_#_step3_c842,gm1.14527_g    TSS36,TSS37    scaffold_10010:1116-2040    -    -    752.921    0    2536.24    OK    1132.62    0    5135    OK

Does anyone have any idea what went wrong?

Adrian

cufflinks cummeRbund R • 3.4k views
ADD COMMENT
0
Entering edit mode

can you show your code that you used to read in the cuffdiff output?

ADD REPLY
0
Entering edit mode

I have modified my post to show that.

ADD REPLY
0
Entering edit mode

It does not seem that there's anything wrong with genes.fpkm_tracking... maybe the scan() function is not opening the right file?

You tried making sym-links to a directory (or putting all input files into 1 directory) and then running readCufflinks() in that directory?

ADD REPLY
0
Entering edit mode

I tried the same commands on a different experiment and I was in the same directory when I invoked R and it worked loading the genes tracking file.

Might it be that scan is parsing the file in a wrong way?

ADD REPLY
0
Entering edit mode

Are you sure your genes.fpkm_tracking is tab-delimited? I think it should be tab-delimited for the program to correctly read the file.

ADD REPLY
0
Entering edit mode

The default output is tab delimited, and what I posted is also tab delimited. Maybe it parses some columns incorrectly? THe gene_short_name column has many commas.

ADD REPLY
0
Entering edit mode
9.7 years ago
Adrian Pelin ★ 2.6k

Alright, this is resolved now!

A quick email from Loyal A. Goff, a cummeRbund dev. hinted at the root of the problem:

Hi Adrian,

Without being able to reproduce the error first hand, my best guess at this point is the '#' characters in your gtf file.  I think others have had issues in the past with '#' because R considers everything after this to be a comment.  Is there any way that these characters can be removed from the reference gtf and cuffdiff could be rerun?

LoyK

Column 5 has these issues, and so does the gtf file. A quick sed fixed the issue.

Adrian

ADD COMMENT

Login before adding your answer.

Traffic: 1949 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6