Definition Of "To Clip Bases"
2
1
Entering edit mode
11.9 years ago
notdurrett ▴ 10

Hi all,

In the Oases paper (http://bioinformatics.oxfordjournals.org/content/28/8/1086.full), the following paragraph occurs:

To reduce the amount of erroneous bases, both paired-end datasets were processed by (i) removing Ns from both ends, (ii) clipping bases with a Sanger quality ≤10 and (iii) removing reads with more than six bases with Sanger quality ≤10 after steps (i) and (ii), leading to a total of 30 940 088 and 64 441 708 reads for human and mouse, respectively.

I'm a bit confused as to what (ii) means. Any insight?

Thanks in advance!

rna-seq • 3.2k views
ADD COMMENT
4
Entering edit mode
11.9 years ago
Arun 2.4k

Suppose, you have a fastq read in this manner:

Read: AACACAATATAGAGAGACCAGGGGACCATGGTATATGGAGT
Qual: ###IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII##

Now, your read has bad quality = 2 (<=10) in the beginning and in the end. This means these bases are not reliable. So, those bases will be clipped resulting in the clipped sequence:

Read: ACAATATAGAGAGACCAGGGGACCATGGTATATGGA
Qual: IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Now here, the read had 5 bad bases. If they were >= 6, then the whole read will be removed.

ADD COMMENT
0
Entering edit mode

Thanks for the answer and example!

ADD REPLY
1
Entering edit mode
11.9 years ago
DG 7.3k

Clipping bases in this case just means trimming/masking. So removing base calls with quality below a threshold. Typically this is occurring at the 5' and 3' ends of reads so you trim inwards to remove the low quality calls

ADD COMMENT
0
Entering edit mode

Thanks for the answer!

ADD REPLY

Login before adding your answer.

Traffic: 2347 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6