Question: Definition Of "To Clip Bases"
1
gravatar for notdurrett
6.7 years ago by
notdurrett10
notdurrett10 wrote:

Hi all,

In the Oases paper (http://bioinformatics.oxfordjournals.org/content/28/8/1086.full), the following paragraph occurs:

To reduce the amount of erroneous bases, both paired-end datasets were processed by (i) removing Ns from both ends, (ii) clipping bases with a Sanger quality ≤10 and (iii) removing reads with more than six bases with Sanger quality ≤10 after steps (i) and (ii), leading to a total of 30 940 088 and 64 441 708 reads for human and mouse, respectively.

I'm a bit confused as to what (ii) means. Any insight?

Thanks in advance!

rna-seq • 1.5k views
ADD COMMENTlink modified 6.7 years ago by Arun2.3k • written 6.7 years ago by notdurrett10
4
gravatar for Arun
6.7 years ago by
Arun2.3k
Germany
Arun2.3k wrote:

Suppose, you have a fastq read in this manner:

Read: AACACAATATAGAGAGACCAGGGGACCATGGTATATGGAGT
Qual: ###IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII##

Now, your read has bad quality = 2 (<=10) in the beginning and in the end. This means these bases are not reliable. So, those bases will be clipped resulting in the clipped sequence:

Read: ACAATATAGAGAGACCAGGGGACCATGGTATATGGA
Qual: IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Now here, the read had 5 bad bases. If they were >= 6, then the whole read will be removed.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Arun2.3k

Thanks for the answer and example!

ADD REPLYlink written 6.7 years ago by notdurrett10
1
gravatar for Dan Gaston
6.7 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

Clipping bases in this case just means trimming/masking. So removing base calls with quality below a threshold. Typically this is occurring at the 5' and 3' ends of reads so you trim inwards to remove the low quality calls

ADD COMMENTlink written 6.7 years ago by Dan Gaston7.1k

Thanks for the answer!

ADD REPLYlink written 6.7 years ago by notdurrett10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 987 users visited in the last hour