**I am using HTSeq to do counts on rnaseq mapped onto a bacterial genome but it is not finding any features. I will share the HTSeq output, my sam file (generated with bbmap), the gff file, and the HTSeq command. It feels like there must be something obvious that I am missing but I haven't been able to figure out what. Thanks for your help.
HTSeq output:
...
gene991 0
gene992 0
gene993 0
gene994 0
gene995 0
gene996 0
gene997 0
gene998 0
gene999 0
no_feature 26940
ambiguous 0
too_low_aQual 0
not_aligned 76674
alignment_not_unique 0
This is a bit of my bbmap generated sam file:
@HD VN:1.4 SO:unsorted
@SQ SN:NC_009523.1 Roseiflexus sp. RS-1, complete genome LN:5801598
@PG ID:BBMap PN:BBMap VN:37.77 CL:java -Djava.library.path=/home/bbmap/jni/ -ea -Xmx49292m align2.BBMap build=1 overwrite=true fastareadlen=500 in=/home/final.contigs_March5.fa ref=/home/Rosei_NCBI_genome.fasta fastareadlen=500 ambig=best printunmappedcount=t outu=combined.input_Rosei_unmapped.fasta.gz out=Rosei_mapped.sam bamscript=bs.sh
qnqme flag rname pos mapq cigar mrnm/rnext mpos/pnext isize/tlen seq
k123_2375 flag=0 multi=11.3491 len=1329_part_2 4 * 0 0 * * 0 0 CTGGCAATGCTTGGTTTGATAGGTCTACCACTCCTGGTGCGCCGCAGCCCGGTGTTGGGCGGCGCTTTGATTGTGGCAGCAGTCACATACATCGGCTACAACGGGTTACTGAGTGACTGGCATGGAAGCGGTGCATTTGGCGTGCGTCGCCTGACCAGCCTGGCCGCCTGGTATGCGCTTGGTCTTGCGGCTCTGGCAGAAGCGTTGATCCGTCGCAGGCATACGTTGACAATGCTGATCGGTATGGTTGCCGGGAGCTGGTGGATGCTGATGCTGCTCGTTCGCAAAGGTGATCTAGGCGACACCAGCGCAGGCGCTCTTGAAATGCTGAGCATTGCCGATCTCTACCTGGGACGCACTGCGCTTCCAGCCTTGCAGTTGGGGGATTTTCTTAGAAGTGGTTTTGTCTGGGATACGGTGACGACGGCGCCGCTTGAATTCTCGGTGAAAGTGCTTGGTTATTTTGCGATTCTGAGCGGACTGGCGCTGTGGGCGTCCTG *
k123_2375 flag=0 multi=11.3491 len=1329_part_3 4 * 0 0 * * 0 0 GAAACTCGCTGCTGGAAGAGCGCCAGAGCGCGCAGCGCCAGGCGTCGGGACGACGATTACCGAATCAATGCACGTGCGTGATTGATGTGCGTCTCAAGATGCGCGTCCCCAAACTGCCTGGGTGCGCAGGCGTCGCGCCCGCACACGACGCTGCGATCCATCACGGAAGCGAGGGCATCTTGCCTGCCGCCACTCCACGGGAGCGGGACGCCTTCGCGCCGTGGCAACCAGTGAACGAACACCAGTGTGTCGCGCAACGTGTTCTGGCTGATGGGATTGACCCCGCCCGTGCGGGCTGAAGCCCTCCCTGAGCGTGTGGAAGCCCCTGC *
k123_2377 flag=1 multi=5.0000 len=429 4 * 0 0 * * 0 0 ATTGCCGAAGTGCGTCCGCGCAGCAGCCTCCCTGACAGCGACGCGCTCGACTTCGGCGACGCGATTGTGATGCCCGCGCTCGTCAACGGGCACTGCCATCTGGAATACACCGCGCTACGCGGGCGGAACGACCGCGCGCCGTTTTTCGAGTGGATTCGCGCGTTGGTCGCCCTGAAGGCGCAGTGCCCACCTGAGCTGTGGCTGCCGTCCGCGCTGCTGGGCGCGGCGGAGCTGATTGCCAGCGGCGTCGGCTTCGTGTCGGACAACACCGACGCAGGCGTGAGCGTAGAGGCGTTGGCGCGGGCAGGGCTGCGCGGGCGCGTCTACCAAGAGGTGTTCGGGATTGAGCCTGAACCCGACGATGCGACCATCCTGCGCGAGCTGACCGCCAAGCTGGAGGCGCACCGACAGACGCTGCAGCGCTACGAG *
k123_2378 flag=0 multi=172.6574 len=339 4 * 0 0 * * 0 0 CCCGGAAATAGTATCTGAGACTACTTTTCTTGGTCTTCTCTCCCCGGCTTGCGGAGGCAAAACGCTCCGCCCCTTTTGGGGAACCTTTTGTGGAAGAGGAAGAACCTAGGCGCCTTGGGCCGCCCCCTGTCCACCCATCTCCCCAACTAAGGCTCTTGATGGAAATCCAGGGATCCCTGCCAAGCCAGACTGGGGGATCCCTCTCCGGCCAGATTGTCCTGATACATTTACGCTGAACACTGCTGAACATCACTGTTCAACTATGCAGCGATAGCATTCTGTTTGTTTCCATTCGCGTTAGCGGGACGGAGTCCAAACAGAATGGCGGTATCCCGCAAA *
k123_2387 flag=1 multi=3.0000 len=301 4 * 0 0 * * 0 0 CAGGTGGACGCGGTGGCGTTCCGCCAGCGCGGTGACTTCATGGTAGGCACGGATCTTTTCGCCCTTGTCGGGCGGGTTGGGGACGCAGTGGGACAGGTACAGCAGCTCCACTGCGCCGCCTCAGCTCTTGGCTGCGTCGCCGCTCTTGGCTGTGGACTCTGCCTTGGATCCGCTGTCCGACGATTTCGTCTCCCGGCTCTTGCCGTTGCTTCCGCTGCTGCCGGAACGGCCGTAGTCGGTGACGTACCACCCCGTGCCCTTGAACTGGAGCGCGGGAACCGAGAGCAGCCGCCGCACCGCG *
k123_2389 flag=1 multi=11.0000 len=500 4 * 0 0 * * 0 0 CACATAGTTCGCCACCGTGTCCGCCAACTGGCGGAAGGCGACGATGTCGAAGTAATCCACCTGCTCTTCTGCGCCGTTCAGGCGGCTGGGGTGGCGGTGCTGCTGCAGGTCGCTGTACCGTGCGGGACGGCGGGGGCGGCGCGGGGGCGGCGGCAGCCGCTGCGGGCATATCTGCGCCCGCTTCAGGCTCGCCCTCGCGCAGTCGCTCCAGAAACCGTACCGTGTCGGCGACGACTTCGTATGCCTTGCGACGGTTGCCCTCACGGTCGGTGTAGCTGCGCGTCTGCAGTTTGCCCTCCACCAGCACCAGCCTGCCTTTGGTCACATAGTTCGCCACCGTGTCCGCCAACTGGCGGAAGGCGACGATGTCGAAGTAATCCACCTGCTCTTCTGCGCCTGGGCGCACGGGACGGTTGACGGCAATCGCGAAGCGCACCACCGAAGTGCCATCTGGGGTTGCCCGCAGTTCGGGGTCGCGCGTCAGACGCCCTACCAAGATG *
k123_2391 flag=1 multi=4.0000 len=416 4 * 0 0 * * 0 0 CAGAACGGCGACGTCGGGCAGTTGTTGGGCAAACGTGCCGTCGTAGGAGTAGGCGATACGGTCCTCTATGCTGGTGAGGACTTTGGACGGACCCAGAACTTCCTGCAATCGGGCAATCCAGACTGGCTCGCTCATACGGGACTCCTCTGATTTTCACCACGAAGACACAAAGACACGAAGAATCAGCAATTGTTAATTTTTTAATTGTTAATTTAACTTCGTGCCTTCGTGTCTTTGTGGTTTTTTACTTCATCGCCTCCAGCACAATTTCGGCGATGTCTTTGACCCGCAGCCGGAGGCGCTCGGTGCGGGCGGCATTGAAGAGGGTGCGCTCGCACTGCTGGCAGGCGGAGACGACGACCTGGGCGCCGGTCTCCGCGGCCTGGCGGATGCGCCGGGCCGCGACGGCCTGGCTG *
k123_2395 flag=0 multi=48482.4661 len=241 16 NC_009523.1 Roseiflexus sp. RS-1, complete genome 1416779 3 241M * 0 0 GCGCGTGGAGGCCCGCACCGGTGTGGATTGCAAACCGCTCGGATGACCTGTGGGTAGGGGTGAAATGCCAATCGAAAGCGGAGATAGCTGGTTCTCCCCGAAATGCATTGAGGTGCAACCCGCGGAAGGGGCGCGCGGAGGTAGAGCGCTGGTGTGGTGCGGGGGCTTCACCGCCTGCCAAACGACGCCAAACTGCGAATGCCGCGCGGTCGCCCGCGGAGTGAGACGTGCGGCGCAAACG * XT:A:R NM:i:4 AM:i:3
k123_2399 flag=1 multi=6.4322 len=757 4 * 0 0 * * 0 0 CTCCACTACGCCTTTGAGCGCGTAGAAGTCGGCAGGTTGGGGCTTGCTTGCCCAGTGCGGCGCACTAAGCAGCCCCGTCATCAGGATCCCCAAATGCGTCCACTCGTGGTAGTTGCCCTGTTCGTCTTGGGCGAACACGCGCCCGATTTCGAACAGGTGCAGGTCGCGGCGGTTGCGACGGCGGTTGTGGTCGGCGGCGCGGATGAGGCTGGACATCAGCGAGTGGCGCAGGCAGCTGAGGTCTTCGCTCATCGGGTTCTGTAGCGATATGGGCTGGTACATCAGCTCGCCGACGCCCTCGCCGCCGCTTTGCAGCGGAGATGGGGCTTCCAGCGTGTGCCCGACGATTTCCTGCAGCCCCGCGCTGAGGAGGATGGTGCGCACGCGCTCGGCGAACGCGCCTTCGGGGCTGTCCTTGCCCTGCGTGGTGTAGCCTGCGGGCGGGACGGCGGGGATGCGCTCGTAGCCCAGAATGCGCCCCAGCTCCTCGATGAGGTCTT *
k123_2399 flag=1 multi=6.4322 len=757_part_2 4 * 0 0 * * 0 0 CCTCGCGCTGCAGGTCGGCGCGGTAGGGGGGAACCTGCACGCGGAAGCCGCCCTCTATCGGCTCTGGGCGCAGATTCAGTCCTCCAGCCACGCGCCGCCCGCCTCGCGGATGGTCTGCTCCAACGCGGCGTAAGGGACGCTCTGCGCGACCACAATCGCCGACCGCTGCCATCGCGGCGCCCAACGCGCGGGAGCCCAAAGTCTCGCCGACCGCTTTGGTGAACGCCTGCCAGCCTTCAGCGTCCCAGATGCCAAAG *
k123_2402 flag=1 multi=3.0000 len=346 4 * 0 0 * * 0 0 CCAAAGACATAATCAGGCAGAAGCTCAAAAAGCTTTTTCTTTATTTCGTTTTCCATTTCATATCTCACTTCTTAATTTTTTTGATAGTTTTTTTATTGCTTGGAAATAATTTGCTTTAAGTCCCCCAACACTTGTTCCTAAAAGTTTTGAAATTTCTTCATAGCTTAAATTATCAAAATATCGTAGTGCAAATGTTTCACGTTGTTTTTCTGGTAATTTCGTAAGTTCCAAAAGAAACCTTCTTTCCAACTCGTTTCCTAAATAATATTGTTCAGGACTAAAATCTTTTTGTTTTTTAATTACGTCATTATTATCGTCGATTCTCAAAAAACTCATAAGTTTTTTC *
k123_2418 flag=0 multi=12738.1129 len=247 4 * 0 0 * * 0 0 CCCCTACCACCCGGGCCCGGCGTACCGGACCCGGATCCACAGCTTCGGTAGGCGGTTTGATCGCCAATCATTTTCGGCGCGACATCGCTCGATGAGTCAGCTGTTACGCACTGTTTAAATGATAGCTGCCTCTAAGCCAACATCCTCACTGTCTAAGCGACGTCACATCCTTTCCACTGAACCGCCTTGAGGCACCTTAGCTGGTGGTCTGGGCTGTTGCCCTCTCGGCCACGGACCTTATCGCTCG *
k123_2419 flag=0 multi=43053.0189 len=229 4 * 0 0 * * 0 0 ATACACCAGAAGGGTGGTATTTCACTGGCGGCTCCACCCTGACCGGAGCCAGGGCTTCTCAGCCTCCCACCTATGCTACGCATCCAATGCACGCGGCCAATACCAAGTTGTAGTGAAGGTTCAAGGGGTCTTTTCGTCCCGTCGCGGGTAACCGGCATCTTCACCGGTACTACAATTTCACCGAGCTCGCGGCCGAGACAGTGCCCAAGTCGTTATGCCATTCGTGCAG *
k123_2420 flag=1 multi=6.0000 len=992 4 * 0 0 * * 0 0 AACTCTCTCAGATGAAGGGATTCTTTGCCGCAGGAGACAAGATCATGCTAATGGTGGAAAGGCACGAAATGCACTCGCACGACATACCCAGGATCGTTTCACTGTTGAGGGAATACGGGATAGACGTTGCTCAGATTCTGATAGGCGATTCTTTTCAGAACGATTTGAGTTCCATGAGCAGGATGAAGCTGCTCGAGGAGCGAGAAACGAAATCCGGCACCAAGGTTGTGAAAAAGCACGTGAGATCGGGTCAGGCTGTGGTACATTCGGGAGACATCGTGGTTCTGGGCAACGTTCATGCAGGCGCAGAGTTGCTCGCGGGTGGATCCATCGTCGTTTTCGGCAACGTCAAGGGTACACTCAGAGCCGGTTTGAACGAGGGAGACAGCGCGATCATCGCCGCTCTGAGCATGCAACCGACATTGATACAGATTTCTGGATATACGCTCAGAGAAGTGTCGAACTACGAAGAACCAGTCATAGTTCAAGTGAAAAATGGT *
k123_2420 flag=1 multi=6.0000 len=992_part_2 4 * 0 0 * * 0 0 AAGATCGTTGTGGAAAAAGCCAAAGCTTGATAGGTGTACTGGTCTCCTCATCACAAGCCATCGTTGGAGGAGTCTTAGGAGCGGCAGAGGTGAACGTTTATCATAGCCAAGGATGGTCATAGAATACTGGTGGCATGGGTCGAAACCCCTGTCGTGTTTGGTTGGTATGACTTCTTTTTTGCTCAAAACGTTGTTTGTTGGTAACCAACTTGAATCAACTGGTTTTTGGAGGTAACATGATAGAGATCGAAGTAGTCGAGCTTTGTCTTGAGTCATTGAGGTCAGGTTGTTTTCCGGTCGGTTCGGTAATTACAAATGCTGAAGGACAGATAATCTCGAAGGGTAGAAACACTATTCACGAATCAGAGCCTAAAGGTTATCCAGTCTTTGGTAACAGCATTGCTCATGCTGAATTGAATGCACTAGTTTCTATGAAAAGAATCCCAGACGATTTAGACGTCTCTGATTACGTTATTTATACTTCTATGGAAC *
k123_2422 flag=0 multi=51274.5645 len=247 0 NC_009523.1 Roseiflexus sp. RS-1, complete genome 1418144 3 247M * 0 0 CGCTCCGCGCCTGCGTAGGATAGGTGGGAGCCTGAGAACGCGCCCTTGCGGGGGCGCGGGAGGCGCCGGTGAAATACCACTCTGGTGCGGCGTGCGCACTCACCCGTGGACAACGGGGACCGTCGCTGGCGGGTAGTTTGTCTGGGGCGGACGCCTCCTAAAGAGTAACGGAGGCGCGCAACGGCGCCCTCAGGCGGGATGGCAATCCGCCGTGGAGTGCAAAGGCAGAAGGGCGCTTGACTGCAAG * XT:A:R NM:i:9 AM:i:3
k123_2425 flag=0 multi=50392.5161 len=247 4 * 0 0 * * 0 0 ATGCAAAAGGCACGCGGTCACACCCTCTGGCCGAGACCAGAGGAAGCTCCCACTGTTTTGTAGGCATCTGGTTTCAGGTTCTATTTCACTCCCTTCTTAAGGGTGCTTTTCACCTTTCCCTCATGGTACTTGTGCACTATCGGTCATCAGCGAGTATTTAGCCTTACCAGATGGTTCTGGCAGATTCACACGGGACCTCACGTACCCCGTGCTACTCAGGAACACCCACGCGCCGCCTTGGATTTCG *
and this a bit of the gff file that I am using:
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build ASM1666v1
#!genome-build-accession NCBI_Assembly:GCF_000016665.1
#!annotation-date 05/15/2017 13:12:56
#!annotation-source NCBI
##sequence-region NC_009523.1 1 5801598
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=357808
NC_009523.1 RefSeq region 1 5801598 . + . ID=id0;Dbxref=taxon:357808;Is_circular=true;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=RS-1
NC_009523.1 RefSeq gene 415 1860 . + . ID=gene0;Name=ROSERS_RS00005;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00005;old_locus_tag=RoseRS_0001
NC_009523.1 Protein Homology CDS 415 1860 . + 0 ID=cds0;Parent=gene0;Dbxref=Genbank:WP_011954804.1;Name=WP_011954804.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_011954804.1;product=chromosomal replication initiator protein DnaA;protein_id=WP_011954804.1;transl_table=11
NC_009523.1 RefSeq gene 2021 3196 . + . ID=gene1;Name=ROSERS_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00010;old_locus_tag=RoseRS_0002
NC_009523.1 Protein Homology CDS 2021 3196 . + 0 ID=cds1;Parent=gene1;Dbxref=Genbank:WP_011954805.1;Name=WP_011954805.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_012122338.1;product=class II aldolase;protein_id=WP_011954805.1;transl_table=11
NC_009523.1 RefSeq gene 3243 3956 . + . ID=gene2;Name=ROSERS_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00015;old_locus_tag=RoseRS_0003
NC_009523.1 Protein Homology CDS 3243 3956 . + 0 ID=cds2;Parent=gene2;Dbxref=Genbank:WP_011954806.1;Name=WP_011954806.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_011954806.1;product=NAD(P)-dependent oxidoreductase;protein_id=WP_011954806.1;transl_table=11
NC_009523.1 RefSeq gene 4002 4865 . - . ID=gene3;Name=ROSERS_RS00020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00020;old_locus_tag=RoseRS_0004
NC_009523.1 Protein Homology CDS 4002 4865 . - 0 ID=cds3;Parent=gene3;Dbxref=Genbank:WP_011954807.1;Name=WP_011954807.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_011954807.1;product=hypothetical protein;protein_id=WP_011954807.1;transl_table=11
NC_009523.1 RefSeq gene 5453 6856 . - . ID=gene4;Name=ROSERS_RS00025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00025;old_locus_tag=RoseRS_0005
NC_009523.1 Protein Homology CDS 5453 6856 . - 0 ID=cds4;Parent=gene4;Dbxref=Genbank:WP_011954808.1;Name=WP_011954808.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_011954808.1;product=NADP oxidoreductase;protein_id=WP_011954808.1;transl_table=11
NC_009523.1 RefSeq gene 7035 7766 . - . ID=gene5;Name=ROSERS_RS00030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00030;old_locus_tag=RoseRS_0006
NC_009523.1 Protein Homology CDS 7035 7766 . - 0 ID=cds5;Parent=gene5;Dbxref=Genbank:WP_011954809.1;Name=WP_011954809.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_011954809.1;product=hypothetical protein;protein_id=WP_011954809.1;transl_table=11
NC_009523.1 RefSeq gene 7768 8280 . - . ID=gene6;Name=ROSERS_RS00035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00035;old_locus_tag=RoseRS_0007
NC_009523.1 Protein Homology CDS 7768 8280 . - 0 ID=cds6;Parent=gene6;Dbxref=Genbank:WP_041332586.1;Name=WP_041332586.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_012122701.1;product=DUF3090 domain-containing protein;protein_id=WP_041332586.1;transl_table=11
NC_009523.1 RefSeq gene 8314 9039 . - . ID=gene7;Name=ROSERS_RS00040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00040;old_locus_tag=RoseRS_0008
NC_009523.1 Protein Homology CDS 8314 9039 . - 0 ID=cds7;Parent=gene7;Dbxref=Genbank:WP_011954811.1;Name=WP_011954811.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_011954811.1;product=phosphoglycerate mutase;protein_id=WP_011954811.1;transl_table=11
NC_009523.1 RefSeq gene 9360 10886 . + . ID=gene8;Name=ROSERS_RS00045;gbkey=Gene;gene_biotype=protein_coding;locus_tag=ROSERS_RS00045;old_locus_tag=RoseRS_0009
and here is my HTSeq command:
python -m HTSeq.scripts.count -i ID -t gene mappedfile.sam annotationfile.gff >outfile.txt
I have played with the -i and -t designations (a bit blindly) changing them to CDS, but it has not changed the outcome.
Thanks!
You realize that the tiny fragment of bam you showed shows all but one of those reads to be unaligned? I tried blasting a couple against nr, and got nothing.
Thanks, I've also tried blasting and did get hits that made sense: the unaligned to other organisms in my system, and the aligned to the organism that I am using as ref genome, Roseiflexus. I am using a metatranscriptome and mapping it to a reference genome to learn the expression of just that one organism. I was assuming that the sam file reflected that, reads from other organisms that are unaligned, and reads from 'my' organism that are aligned..