Query on htseq count
1
0
Entering edit mode
3.4 years ago

I am trying to run htseq-count for carrying out rna-seq analysis for solanum tuberosum and i used the following command:

htseq-count --format bam --order pos -s no -a 10 -t exon -i gene_name --idattr gene_id SO_8612_L1.bam GCF_000226075.1_SolTub_3.0_genomic.gff > L1_htseq_count.tsv

and im getting an error message saying:

[E::idx_find_and_load] Could not retrieve index file for 'SO_8612_L1.bam'
Error processing GFF file (line 12 of file GCF_000226075.1_SolTub_3.0_genomic.gff):
  Feature exon-XM_015312074.1-1 does not contain a 'gene_id' attribute
  [Exception type: ValueError, raised in features.py:326]

How to resolve this? and also can gff be used instead of gtf file. what is the standard gtf file used for Solanum tuberosum?

rna-seq • 1.1k views
ADD COMMENT
2
Entering edit mode

well, the error message does says a few thing already :

  • your did apparently not index your bam file
  • your gff file is not suited for the command line you use. each line in the gff needs to have a gene_id attribute (which apparently it does not). You either fix your gff file or use a different attribute for gene counting.

yes, you can use a gff in stead of gtf file.

ADD REPLY
0
Entering edit mode

Thank you. How do i fix the gff file and make it suitable for this analysis?

ADD REPLY
0
Entering edit mode

add the gene_id tag to each line with the correct value, or get a correct gff file from somewhere, or adjust your commandline if there is a more suitable attribute/tag in the file (perhaps it says geneid or gene or parent ...), you need to have a common attribute present in all lines.

alternative is perhaps to have a look at FeatureCount in stead of htseq-count, that one is for sure faster and might be more lenient to gff formatting.

ADD REPLY
0
Entering edit mode

Ok will look into FeatureCount and also is it better to use gtf or a gff while running stringtie?

ADD REPLY
0
Entering edit mode

Hello lavanyac790!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/15023/query-on-htseq-count

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Sorry for repeated post. Will not do it again

ADD REPLY
0
Entering edit mode
3.4 years ago

If you are going to do gene counting with a separate program, I'd use RSEM, because it is much smarter than featureCounts or htseq-count at dealing with ambiguous reads.

ADD COMMENT
0
Entering edit mode

Thank you will check that out

ADD REPLY

Login before adding your answer.

Traffic: 2946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6