Hello All! I've been analysing some ribosome profiling data and can't seem to get rid of non-relevant RNAs (such as rRNA) from Stringtie's analysis. I believe I am using best practices as described below and am working with the Ensembl yeast (S. cerevisiae) genome assembly R64-1-1.
- Use Cutadapt to remove 1st bp of read, exclude reads < 25 nt, remove adapter sequence, and filter any reads without an adapter sequence
- Generate a "filter" fasta file by going to Ensembl and downloading the cDNA sequences of all non protein coding transcripts
- Use Bowtie2 (very-sensitive setting) to map reads against "filter" fasta file and output all unaligned reads into a new fastq file
- Use HISAT2 to map filtered reads to genome and convert output SAM into BAM using Samtools
- Run Stringtie on HISAT2 output to generate a gtf file with FPKMs for mapped transcripts
- Check output gtf file...still has FPKM values for tRNAs/rRNAs/snoRNAs/etc.?
Am I missing something or did I do something wrong?
**I have not yet checked the bam file to see if there are reads still mapping to the filtered transcripts. If I find that the bam file is clean but am still getting this result, wherein does the problem lie?