Question

the normal range of the percent of the singleton in UPARSE program

0

Entering edit mode

10.4 years ago

hua.peng1314 ▴ 100

Hi,all.I am dealing with some 16S rRNA data from Miseq platform.They are 2*300 reads.

I use Flash software to merge them with the parameter -M 220 directly.And UPARSE is used to filter

the low quality reads with the paramater -maxee 1.

Here is the quality of one of the sample:

read1:

#Base     Mean    Median   Mean    Median
215-219   33.45   38       20.24   22.8
220-224   33.98   38       19.33   21.6
225-229   33.37   38       18.73   21.2
230-234   32.64   37.4     16.15   12.8
235-239   33.15   38       13.53   3.2
240-244   32.93   37.8     12.03   2
245-249   31.70   37       11.13   2
250-254   31.45   36.6     10.12   2
255-259   30.92   36.6     8.31    2
260-264   29.69   35.2     7.47    2
265-269   30.21   36.8     6.55    2
270-274   28.26   34.8     5.78    2
275-279   29.06   36.4     5.12    2
280-284   27.60   34.4     4.37    2
285-289   26.12   31.4     3.86    2
290-294   25.17   31       3.14    2
295-299   24.63   32.2     2.76    2
300-301   21.48   28       2.39    2

read2:

#Base     Mean    Median   Mean   Median
210-214   21.20   24.2     2      35      2   37.6
215-219   20.24   22.8     2      34.6    2   37.2
220-224   19.33   21.6     2      34      2   37
225-229   18.73   21.2     2      34.6    2   37.2
230-234   16.15   12.8     2      31.4    2   36.8
235-239   13.53   3.2      2      27      2   35
240-244   12.03   2        2      24.2    2   33
245-249   11.13   2        2      22.4    2   33
250-254   10.12   2        2      20.2    2   32.4
255-259   8.31    2        2      12      2   28.8
260-264   7.47    2        2      4.2     2   27.6
265-269   6.55    2        2      2       2   25.6
270-274   5.78    2        2      2       2   22.8
275-279   5.12    2        2      2       2   19.2
280-284   4.37    2        2      2       2   8.2
285-289   3.86    2        2      2       2   2
290-294   3.14    2        2      2       2   2
295-299   2.76    2        2      2       2   2
300-301   2.39    2        2      2       2   2

At last I get 3125518 reads.After dereplication 2557464 retained include 2409776 singletons(A singleton is a read with a sequence that is present exactly once, i.e. is unique among the reads). Is that too much? After all they have been amplificated sever times.

I follow this pipeline(http://drive5.com/usearch/manual/uparse_cmds.html) to continue the process.At last I have 1321653 in 3142 OTUs.

Anything wrong while I processing the data or this percent just normal.

Thanks for response

singleton UPARSE 16S • 2.2k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.4 years ago by hua.peng1314 ▴ 100

0

Entering edit mode

Sorry I made some mistake.

The title of the quality should be lake this:

#Base   Mean    Median  Lower Quartile  Upper Quartile  10th Percentile 90th Percentile

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.4 years ago by hua.peng1314 ▴ 100

0

Entering edit mode

The right quality of the read1 is:

200-204   23.87   28.2   9.8   36.8   2   38
205-209   23.65   29.2   8.8   36.4   2   38
210-214   21.20   24.2   2     35     2   37.6
215-219   20.24   22.8   2     34.6   2   37.2
220-224   19.33   21.6   2     34     2   37
225-229   18.73   21.2   2     34.6   2   37.2
230-234   16.15   12.8   2     31.4   2   36.8
235-239   13.53   3.2    2     27     2   35
240-244   12.03   2      2     24.2   2   33
245-249   11.13   2      2     22.4   2   33
250-254   10.12   2      2     20.2   2   32.4
255-259   8.31    2      2     12     2   28.8
260-264   7.47    2      2     4.2    2   27.6
265-269   6.55    2      2     2      2   25.6
270-274   5.78    2      2     2      2   22.8
275-279   5.12    2      2     2      2   19.2
280-284   4.37    2      2     2      2   8.2
285-289   3.86    2      2     2      2   2
290-294   3.14    2      2     2      2   2
295-299   2.76    2      2     2      2   2
300-301   2.39    2      2     2      2   2

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.4 years ago by hua.peng1314 ▴ 100

Ram · Answer 1 · 2015-06-17

0

Entering edit mode

10.4 years ago

5heikki 11k

You should trim your reads before clustering. It looks like you are probably getting so many unique sequences because the quality of the basecalls is total garbage (i.e. probably random) towards the end..

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.4 years ago by 5heikki 11k