Question

Definition Of "To Clip Bases"

1

Entering edit mode

11.9 years ago

notdurrett ▴ 10

Hi all,

In the Oases paper (http://bioinformatics.oxfordjournals.org/content/28/8/1086.full), the following paragraph occurs:

To reduce the amount of erroneous bases, both paired-end datasets were processed by (i) removing Ns from both ends, (ii) clipping bases with a Sanger quality ≤10 and (iii) removing reads with more than six bases with Sanger quality ≤10 after steps (i) and (ii), leading to a total of 30 940 088 and 64 441 708 reads for human and mouse, respectively.

I'm a bit confused as to what (ii) means. Any insight?

Thanks in advance!

rna-seq • 3.2k views

ADD COMMENT • link updated 11.9 years ago by Arun 2.4k • written 11.9 years ago by notdurrett ▴ 10

score 4 · Answer 1 · 2012-05-28

4

Entering edit mode

11.9 years ago

Arun 2.4k

Suppose, you have a fastq read in this manner:

Read: AACACAATATAGAGAGACCAGGGGACCATGGTATATGGAGT
Qual: ###IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII##

Now, your read has bad quality = 2 (<=10) in the beginning and in the end. This means these bases are not reliable. So, those bases will be clipped resulting in the clipped sequence:

Read: ACAATATAGAGAGACCAGGGGACCATGGTATATGGA
Qual: IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Now here, the read had 5 bad bases. If they were >= 6, then the whole read will be removed.

ADD COMMENT • link 11.9 years ago by Arun 2.4k

0

Entering edit mode

Thanks for the answer and example!

ADD REPLY • link 11.9 years ago by notdurrett ▴ 10

score 1 · Answer 2 · 2012-05-28

1

Entering edit mode

11.9 years ago

DG 7.3k

Clipping bases in this case just means trimming/masking. So removing base calls with quality below a threshold. Typically this is occurring at the 5' and 3' ends of reads so you trim inwards to remove the low quality calls