HELP with NGS miRNA mapping: After clipping the adapter sequence, most of reads are removed
0
1
Entering edit mode
8.8 years ago
illinois.ks ▴ 210

Hello,

Previously, I have asked questions related with mapping the miRNAs. (miRNA mapping rate is very low.. (less than 0.03%))

Thank you David! :) Finally I could successfully map my miRNA reads.

But, this time I had another set of samples..but same design. 3 controls, VS 3 treated..

I followed exactly same logic.. ( since they are generated same machine.)

  1. Remove adapter sequence
  2. Remove index sequence

But, This time, I realized that after removing the adapter sequences, (TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC), I can see the file size are reduced dramatically, which means most of reads are removed.

For example, here is the fastq file size for original files

c1 (265M), c2 (428M), c3 (248 M), a1(268M), a2(344M), a3(443M)

after removing the adapter sequences

c1 (132M, okay), c2( 15M, weird), c3(208M, okay), a1(153M, okay), a2(15M, weird), a3(18M, weird)

When I looked at the fastq files, I can see those files (e.g. c2, a2, a3,), many of reads are mostly composed of adapter sequences (I am not sure why it is... maybe experiments were bad? No idea about experiments). I guess that this is the reason that file are mostly chopped.

When, I try to further analysis (e.g. remove index sequence, first 4, last 4 removal for my case), I ran bowtie2.

Here is the result of bowtie2.

                       c1       c2      c3       a1       a2
mapping rate           55.88%   7.75%   70.06%   68.14%   27.48%
Total number of reads   1196717   135524  1841729  1367558  134217

I am wondering whether I can further process of this analysis. I heard that mapping rate should be usually around 50-80%. In my case, it is much less than that. Also the number of total reads are so less.

I need some comments for this. Is it the problem of experiments? OR what else? Can I further analyze this?

miRNA mapping RNA-seq • 2.6k views
ADD COMMENT
2
Entering edit mode

Can you please provide two more things: 1) Number of reads before and after clipping and 2) the read length distribution.

Should be enough, if you just send it for e.g. c2.

ADD REPLY
0
Entering edit mode

Thank you david.

Yes I checked the read length distribution for c2.

Before the clipping the adapter sequence

3463306 36

After the clipping the adapter sequence.

 1049 15
   1613 16
   2055 17
   2051 18
   6626 19
   5612 20
   9208 21
   4478 22
   8396 23
   5625 24
   6632 25
   8194 26
   4760 27
   7259 28
   6845 29
   6794 30
   8149 31
  40178 36

=================================

It seems that originally I had 3 millions reads..with 36 bps.

After chopping, I only have 135524 reads... (I sums up all)

====================================

FYI : Since I also need to clip the first 4 and last 4 index sequences from this, my read length distribution should be shifted 4 bps less.

ADD REPLY
0
Entering edit mode

I concluded that this experiment is somehow wrong.. in some reasons.. don't know why.. since for the side of analysis, there is nothing wrong. .to do.. ;(

We decided to do the experiments again!

ADD REPLY
1
Entering edit mode

For sanity check you can run FastQC and look at the adapter sequences plot, it should match the reduction in sequences.

ADD REPLY
0
Entering edit mode

I will also check with fastQC too. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6