Problem with htseq-count: counts of 0?
1
0
Entering edit mode
3.4 years ago
sbitterw ▴ 10

Hello and thank you for your time and consideration,

I've been using htseq-count for gene counting. It has worked before using a different genome. However, currently, it is failing to count genes. Instead it creates gene count files full of 0 values. Is it because the fasta header and first column of the gff file are not the same (e.g., Sc0000000.g1.t1 vs. Sc0000000)? Please find examples of the fasta alignment file, gff file, and the code I ran below. Thank you again for your help and time.

Fasta alignment file:

'>Sc0000000.g1.t1
atggcggctttagctggcatacgagttcttgagtttgctggattagctccttgcccttttgcaggaatgatattggcag
attttggagcagttgtaacccgcatagatagagtgaaaacgctgtctttcatcggcgtagatccacttgcgcgaggcaaaaaatctatttgcattgat
ctaaagaatcctcgagggaaggagatcgtacgaaagctgtgcctgaagtcggatgtgttgatagagccgtttcgaccaggggtgatggaaacactcgg
cctaggacccgaaatattattaaaggataatccaagactgatttatgctcgattaacaggttatggacagacaggaacacactcccagagggcaggcc
atgacatcaattacattgcgctgagtggattattatcaaagctaggccgcaaggatgaaaaaccaacacctcccataaatctgttagcagactttgct
gggggaggtttgacctgcgtgcttggagtattgttggccttgtttgagagaagtaaatctggaaaaggccaagttgttgacgtttccatggtcgaagg
atcagcatatatcggctcatttatttataaaacactgagtggtcttgaatttgcatggccaaaccccaaggagcgtggcaccaacctattggataccg
gggcgccattttatgatacatacaaaacatcggatggtaaatttatggctgttggtgcacttgaacccaagttttataaacaatttctgaaagggttg
gatttggaagggaaacttccacaccaggcagacttatatcagtggcccaaagcaaaggaagagatagcaagcatattctcaacaaagactcaatcaga
ttggtgcaagatctttgagaatcttgatgcttgcgtggaaccagtgcttaccacagacgaagcacctcttcatcctcataaccaagagagaaatgcat
tttctttcagtcatggggcttatgaacccacccctgcaccaaaactttcaagaactcctgcagactgcctgccaaaacctctccctgggacaggagaa
cacacaatggaaattttacaacatactgggtattcacaatctgatattcaagaacttattgatgagggtgttgttgaatgcccaaacatgaagtcctc
actttga'

GFF file:

Sc0000000   AUGUSTUS    transcript  4615    9517    .   +   .   transcript_id "g1.t1"; gene_id "g1";

Sc0000000   AUGUSTUS    CDS 4615    4948    .   +   0   transcript_id "g1.t1"; gene_id "g1";

Sc0000000   AUGUSTUS    CDS 5690    5916    1   +   2   transcript_id "g1.t1"; gene_id "g1";

Sc0000000   AUGUSTUS    CDS 8068    8266    1   +   0   transcript_id "g1.t1"; gene_id "g1";

Sc0000000   AUGUSTUS    CDS 9114    9517    1   +   2   transcript_id "g1.t1"; gene_id "g1";

Code I ran:

htseq-count -f auto -i gene_id --type CDS -m intersection-nonempty -n 12 6B.bam ../reference/Mcap_reference_new.gff3 '>' ../genecounts/{.}.counts.txt
htseq-count transcriptomics genome fasta • 722 views
ADD COMMENT
2
Entering edit mode
3.4 years ago

shortest possible answer: Yes :) (at least with 99% certainty)

your fasta file does seem to contain gene/CDS sequences, no? You need to make sure to use the genomic fasta file when you also use the corresponding GFF file (it will not make sense to use the transcript fasta file in combination with the genomic GFF file).

if you only have a fasta file with transcripts in, you can consider to use something like Salmon for expression quantification on transcript level.

ADD COMMENT
1
Entering edit mode

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2255 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6