Trimmomatic usage based on input data
1
0
Entering edit mode
2.1 years ago
ja4123 ▴ 20

Hello!

My question is how to improve reads quality based on input data. I mean what kind of filtering and trimming to use in case of sequences reads from RNA-seq, scRNA-seq Chip-Seq, RIP-Seq, CNV, SNP or it does not matter and it is based on other factors?

Thanks in advanced for reply.

Trimmomatic filtering • 945 views
ADD COMMENT
2
Entering edit mode
2.1 years ago
GenoMax 142k

You are not really "improving reads quality" but by trimming you are simply throwing away data that is adapter sequence (or is below an average quality threshold that you decided to use). This action improves average Q-scores for your data.

In most applications as long as you are aligning to a good reference aligners should be able to tackle "bad" data (adapters or bases with poor quality). That said time spent once scanning/trimming the data ensures that you have clean data for any downstream applications.

Only time you would want to be strict is if you were doing any de novo work. There you will want to make sure to remove any extraneous sequence and perhaps data with quality scores below Q20 or so.

ADD COMMENT
0
Entering edit mode

Thanks for reply. What you mean by "adapter sequence"?

ADD REPLY
0
Entering edit mode

Or "Q-score"?

ADD REPLY
1
Entering edit mode

One needs to prepare "libraries" (collection of nucleic acid fragments that come from original sample), for most methods you mentioned above. This generally requires adding long oligos to ends of these fragments (so the "library" fragments can bind the flowcell surface to be sequenced). Sequence for these "oligos" (adapters) is known based on the kit used or can be inferred by sampling data.

Q-scores refer to Phred-quality scores which estimate the probability of a particular basecall being incorrect (LINK). Each base will have an associated q-score.

ADD REPLY
0
Entering edit mode

generally speaking I need to remove adapters that are known but it is not necessary in some cases?

ADD REPLY
1
Entering edit mode

Many aligners will remove parts of a read that do not align with the reference when they write alignments out to SAM format. This is termed "soft clipping". In case of Illumina sequencing one expects adapter sequence to be present only on 3'-end of the read (because of the way that sequencing works) and only when the length of sequencing is longer than the length of the insert. So in theory you could skip trimming and let the aligner "soft clip" non-aligned part of the read.

ADD REPLY
0
Entering edit mode

Okay, thanks for the explanation

ADD REPLY

Login before adding your answer.

Traffic: 2765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6