Error with HTseq RNAseq read count
1
0
Entering edit mode
4.7 years ago
Bioinfonext ▴ 460

Hi,

I am getting error while running HTseq, could you please suggest what is the issue; I am using this command

$ htseq-count Oryza_indica.ASM465v1.44.chr.gff3 Leaf_T1_F_R10_S1_L001.sam >count.txt

$ htseq-count Oryza_indica.ASM465v1.44.chr.gff3 Leaf_T1_F_R10_S1_L001.sam >count.txt
Error occured when processing GFF file (line 1 of file Leaf_T1_F_R10_S1_L001.sam):
  need more than 3 values to unpack
  [Exception type: ValueError, raised in __init__.py:210]

Thanks Yogesh

RNA-Seq HTseq • 2.0k views
ADD COMMENT
3
Entering edit mode
4.7 years ago
shawn.w.foley ★ 1.3k

From the manual:

htseq-count [options] <alignment_files> <gff_file>

You have your gff3 and sam files in the wrong order.

ADD COMMENT
0
Entering edit mode

Hi

I am still getting error:

Error occured when processing GFF file (line 23 of file Oryza_indica.ASM465v1.44.chr.gff3):
  Feature transcript:BGIOSGA002568-TA does not contain a 'gene_id' attribute
  [Exception type: ValueError, raised in count.py:77]

Off file looks like this: I have downloaded it from ensemble :https://plants.ensembl.org/info/website/ftp/index.html but for genome indexing I used GTF file, should I convert the same gif to gif and then use it?

$ vi Oryza_indica.ASM465v1.44.chr.gff3

#!genome-build-accession GCA_000004655.2
#!genebuild-last-updated 2010-07

1       Beijing Genomics Institute      chromosome      1       47283185        .       .       .       ID=chromosome:1;Alias=CM000126.1
###
1       bgi     gene    18113   20165   .       +       .       ID=gene:BGIOSGA002568;biotype=protein_coding;gene_id=BGIOSGA002568;logic_name=genemodel_riceindica_bgi
1       bgi     mRNA    18113   20165   .       +       .       ID=transcript:BGIOSGA002568-TA;Parent=gene:BGIOSGA002568;biotype=protein_coding;transcript_id=BGIOSGA002568-TA
1       bgi     exon    18113   19150   .       +       .       Parent=transcript:BGIOSGA002568-TA;Name=BGIOSGA002568-TA.1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=BGIOSGA002568-TA.1;rank=1
1       bgi     CDS     18113   19150   .       +       0       ID=CDS:BGIOSGA002568-PA;Parent=transcript:BGIOSGA002568-TA;protein_id=BGIOSGA002568-PA
1       bgi     exon    19344   20165   .       +       .       Parent=transcript:BGIOSGA002568-TA;Name=BGIOSGA002568-TA.2;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=BGIOSGA002568-TA.2;rank=2
ADD REPLY
0
Entering edit mode

But When I am using GTF files, it is not showing error: so can I use GTF file instead of GFF3

htseq-count Leaf_T1_F_R10_S1_L001.sam Oryza_indica.ASM465v1.44.chr.gtf  -a 10  >count.txt
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
485285 GFF lines processed.
Warning: Read A00652:16:H7GYCDRXX:1:1101:13503:2613 claims to have an aligned mate which could not be found in an adjacent line.
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
300000 SAM alignment record pairs processed.
400000 SAM alignment record pairs processed.
500000 SAM alignment record pairs processed.
ADD REPLY
1
Entering edit mode

Yes you can use GTF instead of GFF3, I would recommend using whichever file you used for alignment for the sake of consistency.

ADD REPLY
0
Entering edit mode

Thanks a lot for your help Shawn!

ADD REPLY

Login before adding your answer.

Traffic: 2497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6