I am trying to create a count matrix using htseq. My commands are below:
htseq-count -m union -s yes -t exon --idattr=Parent data1.sam GENE.gff > data1.counts
Here is a sample of my GENE.gff
##gff-version 3
# File name:
# Organism: Candida albicans SC5314
# Genome version: A22-s07-m01-r20
# Date created: Sun Feb 19 07:02:19 2017
# Created by: The Candida Genome Database (http://www.candidagenome.org/)
# Contact Email: candida-curator AT lists DOT stanford DOT edu
# Funding: NIDCR at US NIH, grant number 1-R01-DE015873-01
#
Ca22chr1A_C_albicans_SC5314 CGD chromosome 1 3188341 . . . ID=Ca22chr1A_C_albicans_SC5314;Name=Ca22chr1A_C_albicans_SC5314
Ca22chr1B_C_albicans_SC5314 CGD chromosome 1 3188396 . . . ID=Ca22chr1B_C_albicans_SC5314;Name=Ca22chr1B_C_albicans_SC5314
Ca22chr2A_C_albicans_SC5314 CGD chromosome 1 2231883 . . . ID=Ca22chr2A_C_albicans_SC5314;Name=Ca22chr2A_C_albicans_SC5314
Ca22chr2B_C_albicans_SC5314 CGD chromosome 1 2231750 . . . ID=Ca22chr2B_C_albicans_SC5314;Name=Ca22chr2B_C_albicans_SC5314
Ca22chr3A_C_albicans_SC5314 CGD chromosome 1 1799298 . . . ID=Ca22chr3A_C_albicans_SC5314;Name=Ca22chr3A_C_albicans_SC5314
Ca22chr3B_C_albicans_SC5314 CGD chromosome 1 1799271 . . . ID=Ca22chr3B_C_albicans_SC5314;Name=Ca22chr3B_C_albicans_SC5314
Ca22chr4A_C_albicans_SC5314 CGD chromosome 1 1603259 . . . ID=Ca22chr4A_C_albicans_SC5314;Name=Ca22chr4A_C_albicans_SC5314
Ca22chr4B_C_albicans_SC5314 CGD chromosome 1 1603311 . . . ID=Ca22chr4B_C_albicans_SC5314;Name=Ca22chr4B_C_albicans_SC5314
Ca22chr5A_C_albicans_SC5314 CGD chromosome 1 1190869 . . . ID=Ca22chr5A_C_albicans_SC5314;Name=Ca22chr5A_C_albicans_SC5314
Ca22chr5B_C_albicans_SC5314 CGD chromosome 1 1190991 . . . ID=Ca22chr5B_C_albicans_SC5314;Name=Ca22chr5B_C_albicans_SC5314
Ca22chr6A_C_albicans_SC5314 CGD chromosome 1 1033292 . . . ID=Ca22chr6A_C_albicans_SC5314;Name=Ca22chr6A_C_albicans_SC5314
Ca22chr6B_C_albicans_SC5314 CGD chromosome 1 1033212 . . . ID=Ca22chr6B_C_albicans_SC5314;Name=Ca22chr6B_C_albicans_SC5314
Ca22chr7A_C_albicans_SC5314 CGD chromosome 1 949580 . . . ID=Ca22chr7A_C_albicans_SC5314;Name=Ca22chr7A_C_albicans_SC5314
Ca22chr7B_C_albicans_SC5314 CGD chromosome 1 949611 . . . ID=Ca22chr7B_C_albicans_SC5314;Name=Ca22chr7B_C_albicans_SC5314
Ca22chrM_C_albicans_SC5314 CGD chromosome 1 40420 . . . ID=Ca22chrM_C_albicans_SC5314;Name=Ca22chrM_C_albicans_SC5314
Ca22chrRA_C_albicans_SC5314 CGD chromosome 1 2286237 . . . ID=Ca22chrRA_C_albicans_SC5314;Name=Ca22chrRA_C_albicans_SC5314
Ca22chrRB_C_albicans_SC5314 CGD chromosome 1 2285697 . . . ID=Ca22chrRB_C_albicans_SC5314;Name=Ca22chrRB_C_albicans_SC5314
Ca22chr1A_C_albicans_SC5314 CGD gene 4059 4397 . + . ID=C1_00010W_A;Name=C1_00010W_A;Note=%28orf19.6115%29%20Dubious%20open%20reading%20frame;orf_classification=Dubious;Alias=C1_00010W,C1_00010W_B,CaO19.11880,CaO19.13534,CaO19.4402,CaO19.6115,IPF21113.1,IPF27828.1,orf19.13534,orf19.6115
Ca22chr1A_C_albicans_SC5314 CGD mRNA 4059 4397 . + . ID=C1_00010W_A-T;Parent=C1_00010W_A;Name=C1_00010W_A;Note=%28orf19.6115%29%20Dubious%20open%20reading%20frame;orf_classification=Dubious;Alias=C1_00010W,C1_00010W_B,CaO19.11880,CaO19.13534,CaO19.4402,CaO19.6115,IPF21113.1,IPF27828.1,orf19.13534,orf19.6115
Ca22chr1A_C_albicans_SC5314 CGD exon 4059 4397 . + . ID=C1_00010W_A-T-E1;Parent=C1_00010W_A-T
Here is a sample of my .sam file:
D00743:137:CAAWDANXX:1:1101:1129:76971 256 Ca22chr5A_C_albicans_SC5314 148153 3 51M * 0 0 CNGCTGGTTCAGTAGGTAAAACCACCATTGAACTATAATCAGGGTCAGGCA B#<<BFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFF AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:1T49 YT:Z:UU NH:i:2 CC:Z:Ca22chr5B_C_albicans_SC5314 CP:i:148147 HI:i:0
D00743:137:CAAWDANXX:1:1101:1129:76971 0 Ca22chr5B_C_albicans_SC5314 148147 3 51M * 0 0 CNGCTGGTTCAGTAGGTAAAACCACCATTGAACTATAATCAGGGTCAGGCA B#<<BFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFF AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:1T49 YT:Z:UU NH:i:2 HI:i:1
D00743:137:CAAWDANXX:1:1101:1129:87845 272 Ca22chr7A_C_albicans_SC5314 613287 3 51M * 0 0 GTTGGTTGGTCTAAGGATTTTAATAGCAACATCAACAACACATGGTTTCNC F<B/FFFFFFFFFFFFFFFFFFFFF<FFBBFFBFFBFFBFBFFFFFB<<#B AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49A1 YT:Z:UU NH:i:2 CC:Z:Ca22chr7B_C_albicans_SC5314 CP:i:613307 HI:i:0
D00743:137:CAAWDANXX:1:1101:1129:87845 16 Ca22chr7B_C_albicans_SC5314 613307 3 51M * 0 0 GTTGGTTGGTCTAAGGATTTTAATAGCAACATCAACAACACATGGTTTCNC F<B/FFFFFFFFFFFFFFFFFFFFF<FFBBFFBFFBFFBFBFFFFFB<<#B AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49A1 YT:Z:UU NH:i:2 HI:i:1
D00743:137:CAAWDANXX:1:1101:1129:90895 0 Ca22chr3B_C_albicans_SC5314 1786020 3 51M * 0 0 GNCGGCCAAAGCTTCGATTTGGTGCAAGATCATTGGTCTGTTACCGAACTC B#<<BFFFFFFFFFFF/FFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:-6 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0A0G49 YT:Z:UU NH:i:2 HI:i:1
D00743:137:CAAWDANXX:1:1101:1129:90895 256 Ca22chr3A_C_albicans_SC5314 1786049 3 51M * 0 0 GNCGGCCAAAGCTTCGATTTGGTGCAAGATCATTGGTCTGTTACCGAACTC B#<<BFFFFFFFFFFF/FFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:-6 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0A0G49 YT:Z:UU NH:i:2 CC:Z:Ca22chr3B_C_albicans_SC5314 CP:i:1786020 HI:i:0
D00743:137:CAAWDANXX:1:1101:1129:91175 16 Ca22chrRA_C_albicans_SC5314 1494150 3 51M * 0 0 AAAAGCTGTATGTATTGACCATGTTTATATTTACTACTAATTAAATGTCNA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49C1 YT:Z:UU NH:i:2 CC:Z:Ca22chrRB_C_albicans_SC5314 CP:i:1494053 HI:i:0
D00743:137:CAAWDANXX:1:1101:1129:91175 272 Ca22chrRB_C_albicans_SC5314 1494053 3 51M * 0 0 AAAAGCTGTATGTATTGACCATGTTTATATTTACTACTAATTAAATGTCNA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49C1 YT:Z:UU NH:i:2 HI:i:1
D00743:137:CAAWDANXX:1:1101:1130:83249 272 Ca22chr2A_C_albicans_SC5314 1454537 3 51M * 0 0 TAATTTAGTGTTTGGGTCATCGGATTTTCTCAATTTCGATATAGGATTGNC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49G1 YT:Z:UU NH:i:2 CC:Z:Ca22chr2B_C_albicans_SC5314 CP:i:1454552 HI:i:0
D00743:137:CAAWDANXX:1:1101:1130:83249 16 Ca22chr2B_C_albicans_SC5314 1454552 3 51M * 0 0 TAATTTAGTGTTTGGGTCATCGGATTTTCTCAATTTCGATATAGGATTGNC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49G1 YT:Z:UU NH:i:2 HI:i:1
D00743:137:CAAWDANXX:1:1101:1130:86865 16 Ca22chr3B_C_albicans_SC5314 177846 3 51M * 0 0 ATCCATTGGCAAGATCTAACTTGTCGGAATTCACCGGTGACTCACACTTNC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49C1 YT:Z:UU NH:i:2 HI:i:1
D00743:137:CAAWDANXX:1:1101:1130:86865 272 Ca22chr3A_C_albicans_SC5314 177843 3 51M * 0 0 ATCCATTGGCAAGATCTAACTTGTCGGAATTCACCGGTGACTCACACTTNC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB<<#B AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49C1 YT:Z:UU NH:i:2 CC:Z:Ca22chr3B_C_albicans_SC5314 CP:i:177846 HI:i:0
I am getting the following error message though
[bam_sort_core] merging from 14 files... Error occured when processing GFF file (line 53257 of file C_albicans.gff need more than 1 value to unpack [Exception type: ValueError, raised in ___init___.py:207]
I am really new to this and am totally lost as to what is wrong. Any advice would be appreciated!
So how does line 53257 of file C_albicans.gff look like?
can you uppload the Gene.gff file here http://www.tinyupload.com , and post the download link?