Question: fastq reads cleaning
0
gravatar for blueskypie
10 weeks ago by
blueskypie30
United States
blueskypie30 wrote:

In the best practice for short variant discovery from DNA sequencing, there is no mention of reads cleaning (removing adapters, low quality and short reads, etc.) . I wonder whether it's because such step is not necessary, or it's assumed the reads are already cleaned?

If it's not necessary, is it because the bad reads will be filtered out after the mapping step?

cleaning fastq reads • 133 views
ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by blueskypie30

Thanks for the comment, @JC! Even if those bad reads are kept, won't they be filtered out after mapping?

Is there any study to demo that there is a difference on variant calling accuracy w/ and w/o the cleaning step?

Here I'd like to distinguish QC and cleaning: while QC is just to read the data, cleaning is to make change to the raw data.

ADD REPLYlink written 10 weeks ago by blueskypie30

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

See my additional comment below @JC's answer.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax67k
0
gravatar for JC
10 weeks ago by
JC7.8k
Mexico
JC7.8k wrote:

Any good bioinformatician will run a QC before doing anything and perform cleaning if needed. I think it is omitted because of that and recent Illumina protocols returns cleaned reads. I don't remember last time I cleaned some Illumina data since the HiSeq 2000.

ADD COMMENTlink written 10 weeks ago by JC7.8k

That is only true if your sequencing facility does pre-processing/read trimming (generally noticeable if not all reads = length of sequencing) of data (I believe if sequencing provider uses BaseSpace for processing then trimming of adapters may be done by default).

Most aligners will soft-clip parts of read that do not map so that may be one of the reasons people seem to omit scanning/trimming of data. It is a must do, if you are planning to do any de novo analysis at any point in the data life cycle.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax67k

Thanks @genomax for the insight! I'm checking fastp tool, but is a little worried that it may be a bit excessive in cleaning and correction, since it not only removes part of the reads, also makes change to the remaining bases.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by blueskypie30

since it not only removes part of the reads, also makes change to the remaining bases

Only bases that match to a set of reference sequences (adapters) that you can provide should be matched and deleted (within the parameters that you can control e.g. mismatches allowed, initial match length etc) by a scan/trim program. No data should be changed otherwise. Corresponding Q-score strings will also be removed to keep the fastq record intact. If you choose to do Q-score based data trimming then that also will just delete things not change any bases that remain.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax67k

from what I read briefly, seems fastq, for paired sequencing, will change a base to its corresponding base in the other read of the pair if the corresponding base has much higher quality score.

ADD REPLYlink written 10 weeks ago by blueskypie30

Paired-end reads overlap only in instances where one deliberately designs them to do that (e.g. 16S sequencing) or in cases where the length of an insert happens to be shorter than number of cycles sequenced. Even then a special read merging program will be need to be used to do that merging (not sure if fastp does read merging as an option).

Does this apply in your case? Otherwise there should be no overlap between R1/R2.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1547 users visited in the last hour