Question: All my reads fall in intergenic regions ?
0
gravatar for debitboro
19 months ago by
debitboro140
Belgium
debitboro140 wrote:

Hi biostars,

I've performed an alignment using Bowtie on small RNAseq reads (22-50 nt) from total RNA-Seq sequencing experiment. I got almost 90% of multiple mapped reads. Then, I counted the reads per biotype (gtf file from Ensembl) using mmquant program (which is designed for counting tasks in the case of high rate of multiple mapping reads, HTSeqcount and featureCounts don't take into account the multiple mapped reads, that is why I've used mmquant). After getting the matrix of count, and using a shell script I was able to count the reads per biotype class (protein_coding, lincRNA, rRNA, ...). I got like 80% of the alignments falling in intergenic regions (lincRNA), and only 6% of my reads correspond to protein_coding !!!

Can I continue downstream analysis with such results ?

Any idea ?

ADD COMMENTlink modified 19 months ago by Friederike5.2k • written 19 months ago by debitboro140

Did I understand correctly that you have sequenced small RNAs such as miRNA and expect protein coding genes?

ADD REPLYlink written 19 months ago by WouterDeCoster42k

It is total RNAseq experiment. The sequencing has been done on degraded RNA samples (single-end) and with a particular library preparation protocol, that is why I got very short RNAseq reads. We don't target any class of RNAs.

ADD REPLYlink written 19 months ago by debitboro140

If it's total RNA I would expect that you have >80% rRNA

ADD REPLYlink written 19 months ago by Fabio Marroni2.4k

80% rRNA

even if rRNAs have been removed during the experiment with rRNA depletion kit ?

ADD REPLYlink modified 19 months ago by RamRS25k • written 19 months ago by debitboro140

You did not include that critical piece of information in original post. If that is true (and if the depletion did work as expected) it is unclear why you have 90% multi-mapped reads (per featureCounts/htseq-count?).

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax75k

Since the length of my reads is distributed between 22-50 nt, I think it is clear I got a high rate of multiple mapped reads. A very short read of 25 nt will get a higher number of multiple aligned locations on the genome than a read of a higher length. I am right ?

ADD REPLYlink written 19 months ago by debitboro140

No, in that case no. Sorry, I forgot that option.

ADD REPLYlink written 19 months ago by Fabio Marroni2.4k

Just to confirm. You are expecting to get smallRNA reads from a total RNAseq dataset only because you are aligning with bowtie v.1?

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax75k
1
gravatar for Friederike
19 months ago by
Friederike5.2k
United States
Friederike5.2k wrote:

I strongly recommend you do stringent quality controls with established tools such as QoRTs or RSeQC.

Things that often go wrong with RNA-seq and that you may want to look out for:

  • DNA contamination (many reads mapping to non-annotated loci)
  • lack of library diversity, i.e., you started with very few viable RNA-seq molecules, ended up amplifying those and then sequencing the same sequences over and over again.
  • rRNA contamination -- your initial statement about the abundance of multiply aligned reads sounds like this may actually be the case for your data
  • 3' bias -- with highly degraded RNA, this is often seen

Why are you using Bowtie instead of established spliced-aware aligners such as STAR? Is there a reason for you to expect that you get mostly multiply aligning reads? I.e., did you enrich for repetitive regions?

Can I continue downstream analysis with such results ?

That depends on the questions you're interested in and the analyses you have in mind.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Friederike5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2112 users visited in the last hour