Entering edit mode
3.4 years ago
lavanyac790
•
0
I am trying to run htseq-count for carrying out rna-seq analysis for solanum tuberosum and i used the following command:
htseq-count --format bam --order pos -s no -a 10 -t exon -i gene_name --idattr gene_id SO_8612_L1.bam GCF_000226075.1_SolTub_3.0_genomic.gff > L1_htseq_count.tsv
and im getting an error message saying:
[E::idx_find_and_load] Could not retrieve index file for 'SO_8612_L1.bam'
Error processing GFF file (line 12 of file GCF_000226075.1_SolTub_3.0_genomic.gff):
Feature exon-XM_015312074.1-1 does not contain a 'gene_id' attribute
[Exception type: ValueError, raised in features.py:326]
How to resolve this? and also can gff be used instead of gtf file. what is the standard gtf file used for Solanum tuberosum?
well, the error message does says a few thing already :
yes, you can use a gff in stead of gtf file.
Thank you. How do i fix the gff file and make it suitable for this analysis?
add the gene_id tag to each line with the correct value, or get a correct gff file from somewhere, or adjust your commandline if there is a more suitable attribute/tag in the file (perhaps it says geneid or gene or parent ...), you need to have a common attribute present in all lines.
alternative is perhaps to have a look at FeatureCount in stead of htseq-count, that one is for sure faster and might be more lenient to gff formatting.
Ok will look into FeatureCount and also is it better to use gtf or a gff while running stringtie?
Hello lavanyac790!
It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/15023/query-on-htseq-count
This is typically not recommended as it runs the risk of annoying people in both communities.
Sorry for repeated post. Will not do it again