Question

Problem with htseq-count: counts of 0?

0

Entering edit mode

3.4 years ago

sbitterw ▴ 10

Hello and thank you for your time and consideration,

I've been using htseq-count for gene counting. It has worked before using a different genome. However, currently, it is failing to count genes. Instead it creates gene count files full of 0 values. Is it because the fasta header and first column of the gff file are not the same (e.g., Sc0000000.g1.t1 vs. Sc0000000)? Please find examples of the fasta alignment file, gff file, and the code I ran below. Thank you again for your help and time.

Fasta alignment file:

'>Sc0000000.g1.t1
atggcggctttagctggcatacgagttcttgagtttgctggattagctccttgcccttttgcaggaatgatattggcag
attttggagcagttgtaacccgcatagatagagtgaaaacgctgtctttcatcggcgtagatccacttgcgcgaggcaaaaaatctatttgcattgat
ctaaagaatcctcgagggaaggagatcgtacgaaagctgtgcctgaagtcggatgtgttgatagagccgtttcgaccaggggtgatggaaacactcgg
cctaggacccgaaatattattaaaggataatccaagactgatttatgctcgattaacaggttatggacagacaggaacacactcccagagggcaggcc
atgacatcaattacattgcgctgagtggattattatcaaagctaggccgcaaggatgaaaaaccaacacctcccataaatctgttagcagactttgct
gggggaggtttgacctgcgtgcttggagtattgttggccttgtttgagagaagtaaatctggaaaaggccaagttgttgacgtttccatggtcgaagg
atcagcatatatcggctcatttatttataaaacactgagtggtcttgaatttgcatggccaaaccccaaggagcgtggcaccaacctattggataccg
gggcgccattttatgatacatacaaaacatcggatggtaaatttatggctgttggtgcacttgaacccaagttttataaacaatttctgaaagggttg
gatttggaagggaaacttccacaccaggcagacttatatcagtggcccaaagcaaaggaagagatagcaagcatattctcaacaaagactcaatcaga
ttggtgcaagatctttgagaatcttgatgcttgcgtggaaccagtgcttaccacagacgaagcacctcttcatcctcataaccaagagagaaatgcat
tttctttcagtcatggggcttatgaacccacccctgcaccaaaactttcaagaactcctgcagactgcctgccaaaacctctccctgggacaggagaa
cacacaatggaaattttacaacatactgggtattcacaatctgatattcaagaacttattgatgagggtgttgttgaatgcccaaacatgaagtcctc
actttga'

GFF file:

Sc0000000   AUGUSTUS    transcript  4615    9517    .   +   .   transcript_id "g1.t1"; gene_id "g1";

Sc0000000   AUGUSTUS    CDS 4615    4948    .   +   0   transcript_id "g1.t1"; gene_id "g1";

Sc0000000   AUGUSTUS    CDS 5690    5916    1   +   2   transcript_id "g1.t1"; gene_id "g1";

Sc0000000   AUGUSTUS    CDS 8068    8266    1   +   0   transcript_id "g1.t1"; gene_id "g1";

Sc0000000   AUGUSTUS    CDS 9114    9517    1   +   2   transcript_id "g1.t1"; gene_id "g1";

Code I ran:

htseq-count -f auto -i gene_id --type CDS -m intersection-nonempty -n 12 6B.bam ../reference/Mcap_reference_new.gff3 '>' ../genecounts/{.}.counts.txt

htseq-count transcriptomics genome fasta • 722 views

ADD COMMENT • link 22 months ago by sbitterw ▴ 10

score 2 · Accepted Answer · 2020-12-11

2

Entering edit mode

3.4 years ago

lieven.sterck 15k

shortest possible answer: Yes :) (at least with 99% certainty)

your fasta file does seem to contain gene/CDS sequences, no? You need to make sure to use the genomic fasta file when you also use the corresponding GFF file (it will not make sense to use the transcript fasta file in combination with the genomic GFF file).

if you only have a fasta file with transcripts in, you can consider to use something like Salmon for expression quantification on transcript level.