Error occurred when processing GFF file, need more than 1 value to unpack
1
0
Entering edit mode
7.1 years ago

I am trying to create a count matrix using htseq. My commands are below:

htseq-count -m union -s yes -t exon --idattr=Parent data1.sam GENE.gff > data1.counts

Here is a sample of my GENE.gff

##gff-version 3
# File name: 
# Organism: Candida albicans SC5314
# Genome version: A22-s07-m01-r20
# Date created: Sun Feb 19 07:02:19 2017
# Created by: The Candida Genome Database (http://www.candidagenome.org/)
# Contact Email: candida-curator AT lists DOT stanford DOT edu
# Funding: NIDCR at US NIH, grant number 1-R01-DE015873-01
#
Ca22chr1A_C_albicans_SC5314 CGD chromosome  1   3188341 .   .   .   ID=Ca22chr1A_C_albicans_SC5314;Name=Ca22chr1A_C_albicans_SC5314
Ca22chr1B_C_albicans_SC5314 CGD chromosome  1   3188396 .   .   .   ID=Ca22chr1B_C_albicans_SC5314;Name=Ca22chr1B_C_albicans_SC5314
Ca22chr2A_C_albicans_SC5314 CGD chromosome  1   2231883 .   .   .   ID=Ca22chr2A_C_albicans_SC5314;Name=Ca22chr2A_C_albicans_SC5314
Ca22chr2B_C_albicans_SC5314 CGD chromosome  1   2231750 .   .   .   ID=Ca22chr2B_C_albicans_SC5314;Name=Ca22chr2B_C_albicans_SC5314
Ca22chr3A_C_albicans_SC5314 CGD chromosome  1   1799298 .   .   .   ID=Ca22chr3A_C_albicans_SC5314;Name=Ca22chr3A_C_albicans_SC5314
Ca22chr3B_C_albicans_SC5314 CGD chromosome  1   1799271 .   .   .   ID=Ca22chr3B_C_albicans_SC5314;Name=Ca22chr3B_C_albicans_SC5314
Ca22chr4A_C_albicans_SC5314 CGD chromosome  1   1603259 .   .   .   ID=Ca22chr4A_C_albicans_SC5314;Name=Ca22chr4A_C_albicans_SC5314
Ca22chr4B_C_albicans_SC5314 CGD chromosome  1   1603311 .   .   .   ID=Ca22chr4B_C_albicans_SC5314;Name=Ca22chr4B_C_albicans_SC5314
Ca22chr5A_C_albicans_SC5314 CGD chromosome  1   1190869 .   .   .   ID=Ca22chr5A_C_albicans_SC5314;Name=Ca22chr5A_C_albicans_SC5314
Ca22chr5B_C_albicans_SC5314 CGD chromosome  1   1190991 .   .   .   ID=Ca22chr5B_C_albicans_SC5314;Name=Ca22chr5B_C_albicans_SC5314
Ca22chr6A_C_albicans_SC5314 CGD chromosome  1   1033292 .   .   .   ID=Ca22chr6A_C_albicans_SC5314;Name=Ca22chr6A_C_albicans_SC5314
Ca22chr6B_C_albicans_SC5314 CGD chromosome  1   1033212 .   .   .   ID=Ca22chr6B_C_albicans_SC5314;Name=Ca22chr6B_C_albicans_SC5314
Ca22chr7A_C_albicans_SC5314 CGD chromosome  1   949580  .   .   .   ID=Ca22chr7A_C_albicans_SC5314;Name=Ca22chr7A_C_albicans_SC5314
Ca22chr7B_C_albicans_SC5314 CGD chromosome  1   949611  .   .   .   ID=Ca22chr7B_C_albicans_SC5314;Name=Ca22chr7B_C_albicans_SC5314
Ca22chrM_C_albicans_SC5314  CGD chromosome  1   40420   .   .   .   ID=Ca22chrM_C_albicans_SC5314;Name=Ca22chrM_C_albicans_SC5314
Ca22chrRA_C_albicans_SC5314 CGD chromosome  1   2286237 .   .   .   ID=Ca22chrRA_C_albicans_SC5314;Name=Ca22chrRA_C_albicans_SC5314
Ca22chrRB_C_albicans_SC5314 CGD chromosome  1   2285697 .   .   .   ID=Ca22chrRB_C_albicans_SC5314;Name=Ca22chrRB_C_albicans_SC5314
Ca22chr1A_C_albicans_SC5314 CGD gene    4059    4397    .   +   .   ID=C1_00010W_A;Name=C1_00010W_A;Note=%28orf19.6115%29%20Dubious%20open%20reading%20frame;orf_classification=Dubious;Alias=C1_00010W,C1_00010W_B,CaO19.11880,CaO19.13534,CaO19.4402,CaO19.6115,IPF21113.1,IPF27828.1,orf19.13534,orf19.6115
Ca22chr1A_C_albicans_SC5314 CGD mRNA    4059    4397    .   +   .   ID=C1_00010W_A-T;Parent=C1_00010W_A;Name=C1_00010W_A;Note=%28orf19.6115%29%20Dubious%20open%20reading%20frame;orf_classification=Dubious;Alias=C1_00010W,C1_00010W_B,CaO19.11880,CaO19.13534,CaO19.4402,CaO19.6115,IPF21113.1,IPF27828.1,orf19.13534,orf19.6115
Ca22chr1A_C_albicans_SC5314 CGD exon    4059    4397    .   +   .   ID=C1_00010W_A-T-E1;Parent=C1_00010W_A-T

Here is a sample of my .sam file:

D00743:137:CAAWDANXX:1:1101:1129:76971  256 Ca22chr5A_C_albicans_SC5314 148153  3   51M *   0   0   CNGCTGGTTCAGTAGGTAAAACCACCATTGAACTATAATCAGGGTCAGGCA B#<<BFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFF AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:1T49   YT:Z:UU NH:i:2  CC:Z:Ca22chr5B_C_albicans_SC5314    CP:i:148147 HI:i:0
D00743:137:CAAWDANXX:1:1101:1129:76971  0   Ca22chr5B_C_albicans_SC5314 148147  3   51M *   0   0   CNGCTGGTTCAGTAGGTAAAACCACCATTGAACTATAATCAGGGTCAGGCA B#<<BFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFF AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:1T49   YT:Z:UU NH:i:2  HI:i:1
D00743:137:CAAWDANXX:1:1101:1129:87845  272 Ca22chr7A_C_albicans_SC5314 613287  3   51M *   0   0   GTTGGTTGGTCTAAGGATTTTAATAGCAACATCAACAACACATGGTTTCNC F<B/FFFFFFFFFFFFFFFFFFFFF<FFBBFFBFFBFFBFBFFFFFB<<#B AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:49A1   YT:Z:UU NH:i:2  CC:Z:Ca22chr7B_C_albicans_SC5314    CP:i:613307 HI:i:0
D00743:137:CAAWDANXX:1:1101:1129:87845  16  Ca22chr7B_C_albicans_SC5314 613307  3   51M *   0   0   GTTGGTTGGTCTAAGGATTTTAATAGCAACATCAACAACACATGGTTTCNC F<B/FFFFFFFFFFFFFFFFFFFFF<FFBBFFBFFBFFBFBFFFFFB<<#B AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:49A1   YT:Z:UU NH:i:2  HI:i:1
D00743:137:CAAWDANXX:1:1101:1129:90895  0   Ca22chr3B_C_albicans_SC5314 1786020 3   51M *   0   0   GNCGGCCAAAGCTTCGATTTGGTGCAAGATCATTGGTCTGTTACCGAACTC B#<<BFFFFFFFFFFF/FFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:-6 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:0A0G49 YT:Z:UU NH:i:2  HI:i:1
D00743:137:CAAWDANXX:1:1101:1129:90895  256 Ca22chr3A_C_albicans_SC5314 1786049 3   51M *   0   0   GNCGGCCAAAGCTTCGATTTGGTGCAAGATCATTGGTCTGTTACCGAACTC B#<<BFFFFFFFFFFF/FFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:-6 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:0A0G49 YT:Z:UU NH:i:2  CC:Z:Ca22chr3B_C_albicans_SC5314    CP:i:1786020    HI:i:0
D00743:137:CAAWDANXX:1:1101:1129:91175  16  Ca22chrRA_C_albicans_SC5314 1494150 3   51M *   0   0   AAAAGCTGTATGTATTGACCATGTTTATATTTACTACTAATTAAATGTCNA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:49C1   YT:Z:UU NH:i:2  CC:Z:Ca22chrRB_C_albicans_SC5314    CP:i:1494053    HI:i:0
D00743:137:CAAWDANXX:1:1101:1129:91175  272 Ca22chrRB_C_albicans_SC5314 1494053 3   51M *   0   0   AAAAGCTGTATGTATTGACCATGTTTATATTTACTACTAATTAAATGTCNA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:49C1   YT:Z:UU NH:i:2  HI:i:1
D00743:137:CAAWDANXX:1:1101:1130:83249  272 Ca22chr2A_C_albicans_SC5314 1454537 3   51M *   0   0   TAATTTAGTGTTTGGGTCATCGGATTTTCTCAATTTCGATATAGGATTGNC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:49G1   YT:Z:UU NH:i:2  CC:Z:Ca22chr2B_C_albicans_SC5314    CP:i:1454552    HI:i:0
D00743:137:CAAWDANXX:1:1101:1130:83249  16  Ca22chr2B_C_albicans_SC5314 1454552 3   51M *   0   0   TAATTTAGTGTTTGGGTCATCGGATTTTCTCAATTTCGATATAGGATTGNC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:49G1   YT:Z:UU NH:i:2  HI:i:1
D00743:137:CAAWDANXX:1:1101:1130:86865  16  Ca22chr3B_C_albicans_SC5314 177846  3   51M *   0   0   ATCCATTGGCAAGATCTAACTTGTCGGAATTCACCGGTGACTCACACTTNC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:49C1   YT:Z:UU NH:i:2  HI:i:1
D00743:137:CAAWDANXX:1:1101:1130:86865  272 Ca22chr3A_C_albicans_SC5314 177843  3   51M *   0   0   ATCCATTGGCAAGATCTAACTTGTCGGAATTCACCGGTGACTCACACTTNC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:49C1   YT:Z:UU NH:i:2  CC:Z:Ca22chr3B_C_albicans_SC5314    CP:i:177846 HI:i:0

I am getting the following error message though

[bam_sort_core] merging from 14 files... Error occured when processing GFF file (line 53257 of file C_albicans.gff need more than 1 value to unpack [Exception type: ValueError, raised in ___init___.py:207]

I am really new to this and am totally lost as to what is wrong. Any advice would be appreciated!

RNA-Seq sequence htseq • 2.8k views
ADD COMMENT
1
Entering edit mode

So how does line 53257 of file C_albicans.gff look like?

ADD REPLY
0
Entering edit mode

can you uppload the Gene.gff file here http://www.tinyupload.com , and post the download link?

ADD REPLY
0
Entering edit mode
7.1 years ago

Line 53257:

Ca22chrRB_C_albicans_SC5314 CGD exon    2284837 2285658 .   -   .   ID=CR_10860C_B-T-E1;Parent=CR_10860C_B-T
Ca22chrRB_C_albicans_SC5314 CGD CDS 2284837 2285658 .   -   0   ID=CR_10860C_B-P;Parent=CR_10860C_B-T;orf_classification=Verified;parent_feature_type=ORF
##FASTA
>Ca22chr1A_C_albicans_SC5314 (3188341 nucleotides)
GAGTCACGCCAATCACAAATTCCTTTGAAAAACTTGATTCGACCACATTCACAAGTTTGA
ADD COMMENT
0
Entering edit mode

Please use ADD REPLY to answer to earlier reactions, as such this thread remains logically structured and easy to follow. I have now moved your reply, but as you can see it's not optimal.

So which line is it? The exon or CDS line? Both look normal to me...

ADD REPLY

Login before adding your answer.

Traffic: 1770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6