Question: Software For Quality Filtering Of 454 Data Sets
2
gravatar for Pawel Szczesny
10.0 years ago by
Pawel Szczesny3.2k
Poland
Pawel Szczesny3.2k wrote:

454 technology produces a number of errors in the reads, mostly (but not only) related to homopolymeric runs. It requires some degree of quality filtering, that is removing reads that contain false information. It's often based on quite simple measures of number of consecutive low quality bases and length. Are there any other approaches to quality filtering than the ones implemented in Pyro/AmpliconNoise packages?

read quality genomics filter • 3.6k views
ADD COMMENTlink modified 9.7 years ago by Casbon3.2k • written 10.0 years ago by Pawel Szczesny3.2k
2

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its way of "denoising" by clustering is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising method not developed in the context of your application, you will make compromise, which is suboptimal.

ADD REPLYlink written 10.0 years ago by lh332k

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, denoising mostly causes troubles.

ADD REPLYlink written 10.0 years ago by lh332k

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, better work with raw data. 454 reads are not so difficult/different to process. I do not see much need of denoising and few are doing that.

ADD REPLYlink written 10.0 years ago by lh332k

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, better work with that. 454 reads are not so difficult/different to process. I do not see much need of denoising and few are doing that.

ADD REPLYlink written 10.0 years ago by lh332k

When using unfiltered data one risks an over-prediction of microbial diversity in the metagenomic samples. See "The 'rare biosphere': a reality check".

ADD REPLYlink modified 15 months ago by _r_am31k • written 10.0 years ago by Pawel Szczesny3.2k

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its "denoising" step is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising procedure, you will probably make compromise. My overall advice is: explicitly model sequencing errors in your application, but do not rely on a 3rd-party "denoiser" that is not built for your application.

ADD REPLYlink written 10.0 years ago by lh332k

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its way of "denoising" by clustering is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising procedure, you will probably make compromise. My overall advice is: explicitly model sequencing errors in your application, but do not rely on a 3rd-party "denoiser" that is not built for your application.

ADD REPLYlink written 10.0 years ago by lh332k

Are you asking this for amplicon (PCR product sequencing) or shotgun reads?

ADD REPLYlink written 10.0 years ago by lexnederbragt1.2k

lh3, I see. Yes, denoising is indeed misleading, as I see people use it in a quite different context. I will re-edit the question in a minute.

ADD REPLYlink written 10.0 years ago by Pawel Szczesny3.2k

fixlex, mostly for amplicon based reads.

ADD REPLYlink written 10.0 years ago by Pawel Szczesny3.2k
3
gravatar for Istvan Albert
10.0 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

The mothur package has a number of methods for 454 based read filtering.

Take a look at the trim.seqs command:

ADD COMMENTlink written 10.0 years ago by Istvan Albert ♦♦ 86k
1
gravatar for Casbon
10.0 years ago by
Casbon3.2k
Casbon3.2k wrote:

If you have a reference or high coverage then using the 454 toolchain for mapping or assembly should handle this. The specific error modality you refer to, homopolymer runs, does not require the removal of the reads but careful calling of certain bases (those in homopolymer runs). Recent versions of the Newbler software output a histogram of signal strengths for the homopolymer runs to allow you to see the distribution of signal at these sites.

ADD COMMENTlink written 10.0 years ago by Casbon3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1043 users visited in the last hour