I recently tried to map Ribo-seq reads to annotated portions of the transcriptome using htseq-count, and everyone of my sequences mapped to 0 features. I don't know what's going wrong!
Any help would be hugely appreciated! I have been struggling with this for a couple of days and am running out of ideas.
What I've done so far, briefly: retrieved FASTQ, trimmed reads, filtered out rRNA, prepared RSEM reference, calculated expression with RSEM, sorted BAM, used htseq-count on BAM and Gencode GFF3/GTF.
Htseq-count says all reads map to 0 features.
A little more detail:
1. Retrieved SRR FASTQ from SRA using fastq-dump.
2. PolyA adapters trimmed, with cutadapt (new file: same # of reads, now variable length, 25-40 bases).
3. Built Bowtie2 index for each of 4 rRNA sequences.
[28S:NR_003287.2, 18S:NR_003286.2, 5.8S:NR_003285.2, 5S:NR_023363.1]
4. Removed rRNA from FASTQ by Bowtie2 alignment (new file: fewer reads; no rRNA contamination according to fastqc).
5. Used rsem-prepare-reference on gencode.v30.transcripts.fa (retrieved from top of Fasta files section at https://www.gencodegenes.org/human/).
6. Used rsem-calculate-expression on the FASTQ reads with the RSEM reference prepared above (outputs multiple files, one of which is a BAM).
7. Sorted the BAM with samtools sort.
8. Used htseq-count on the sorted BAM produced above and gencode.v30.annotation.gff3 (retrieved from top of GTF/GFF3 files section at https://www.gencodegenes.org/human/). (Also tried with the GTF file.)
9. htseq returned a column of ENSG codes (which I don't quite understand) and a column of integer values (all 0). At the bottom of the output, it gave the following summary:
Thanks so much for any help!