Question: (Closed) Removed rRNAs, tRNAs, snoRNAs, etc. still have FPKM values
0
gravatar for kyusikkim
2.3 years ago by
kyusikkim10
kyusikkim10 wrote:

Hello All! I've been analysing some ribosome profiling data and can't seem to get rid of non-relevant RNAs (such as rRNA) from Stringtie's analysis. I believe I am using best practices as described below and am working with the Ensembl yeast (S. cerevisiae) genome assembly R64-1-1.

  1. Use Cutadapt to remove 1st bp of read, exclude reads < 25 nt, remove adapter sequence, and filter any reads without an adapter sequence
  2. Generate a "filter" fasta file by going to Ensembl and downloading the cDNA sequences of all non protein coding transcripts
  3. Use Bowtie2 (very-sensitive setting) to map reads against "filter" fasta file and output all unaligned reads into a new fastq file
  4. Use HISAT2 to map filtered reads to genome and convert output SAM into BAM using Samtools
  5. Run Stringtie on HISAT2 output to generate a gtf file with FPKMs for mapped transcripts
  6. Check output gtf file...still has FPKM values for tRNAs/rRNAs/snoRNAs/etc.?

Am I missing something or did I do something wrong?

**I have not yet checked the bam file to see if there are reads still mapping to the filtered transcripts. If I find that the bam file is clean but am still getting this result, wherein does the problem lie?

hisat2 stringtie • 943 views
ADD COMMENTlink modified 2.3 years ago by Carlo Yague4.6k • written 2.3 years ago by kyusikkim10

Turned out the reads where of low quality and so I filtered them out based on map score!

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by kyusikkim10
0
gravatar for Carlo Yague
2.3 years ago by
Carlo Yague4.6k
Belgium
Carlo Yague4.6k wrote:

have not yet checked the bam file to see if there are reads still mapping to the filtered transcripts.

You should definitely try that first. If there are reads mapped at those filtered locations, then it means that HISAT2 mapped some reads that bowtie2 discarded. This would be the most likely hypothesis and there are several reasons to could explain it :

  • "readthrough" reads : transcription can sometimes fail to terminate properly, generating readthrough transcripts. Reads derived from readthrough transcripts can not map well on cDNA sequence because they overlap the gene and its downstream region.
  • tRNA splicing : some tRNAs have introns and reads from intron-countaining tRNAs wouldn't map on cDNA sequences if it doesn't include the introns sequence (perhaps it does, I don't know). Reversely, if the annotation countain the full length pre-tRNA sequence, mature tRNA can not be mapped by bowtie2.
  • Bowtie2/HISAT2 settings : different aligners with different settings...
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Carlo Yague4.6k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1149 users visited in the last hour