Tool: AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
12
gravatar for chen
2.6 years ago by
chen1.7k
OpenGene
chen1.7k wrote:

Hi, this tool may save your time, it do filtering and QC with fastq data automatically

following introduction is out of date and the newer AfterQC is much more powerful, please check the github page for update

AfterQC

project on github: https://github.com/OpenGene/AfterQC
sample report: http://opengene.org/AfterQC/report.html

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.
Currently it supports processing data from HiSeq 2000/2500/3000/4000, X10, X5, Nextseq 500/550, MiniSeq...

Features:

AfterQC does following tasks automatically:

  • Filters reads with too low quality, too short length or too many N
  • Filters reads with abnormal PolyA/PolyT/PolyC/PolyG sequences
  • Does per-base quality control and plots the figures
  • Trims reads at front and tail, according to QC results
  • For pair-end sequencing data, AfterQC automatically corrects low quality wrong bases in overlapped area of read1/read2
  • Detects and eliminates bubble artifact caused by sequencer due to fluid dynamics issues
  • Single molecule barcode sequencing support: if all reads have a single molecule barcode (see duplex sequencing), AfterQC shifts the barcodes from the reads to the fastq query names
  • Support both single-end sequencing and pair-end sequencing data

Dependency:

AfterQC uses editdistance module, run following before using AfterQC:

pip install editdistance

WARNING: If you haven't installed editdistance module, AfterQC will use a python implementation of editdistance, but it will be extremely slow.

Simple usage:

1, Prepare your fastq files in a folder
2, For single-end sequencing, the filenames in the folder should be *R1*
For pair-end sequencing, the filenames in the folder should be *R1* and *R2*

cd /path/to/fastq/folder
python path/to/AfterQC/after.py

Two folders will be automatically generated, a folder 'good' stores the good reads and a folder 'bad' stores the bad reads
AfterQC will print some statistical information after it is done, such how many good reads, how many bad reads, and how many reads are corrected.

Quality Control only

If you only want to get quality control statistics, run:
python after.py --qc_only

Understand the report

  • AfterQC will generate a QC folder, which contains lots of figures.
  • For pair-end sequencing data, both read1 and read2 figures will be in the same folder with the folder name of read1's filename. R1 means read1, R2 means read2.
  • For single-end sequencing data, it will still have R1.
  • prefilter means before filtering, postfilter means after filtering
  • For pair-end sequencing data, After will do an overlap analysis. read1 and read2 will be overlapped when read1_length + read2_length > DNA_template_length.
ADD COMMENTlink modified 11 months ago by zhimenggan0 • written 2.6 years ago by chen1.7k

Hello,

I've got a few questions about the calcs in AfterQC. In the AfterQC paper, you note that "AfterQC can detect the mismatches in the overlapping regions. For those reads with very long overlap (i.e. overlap_len>50)".

In the estimated seq error field in the html report, are only overlaps greater than 50bp considered? And are the errors in these overlaps the only component that goes into the seq error rate calculation?

If only overlaps greater than 50bp go into the calculation, could you please let me know where should I change the source to modify that number (my guess is complete_compare_require in util.py)?

Thanks very much for the software!

ADD REPLYlink written 14 months ago by atcg10

Please don't post new questions in the answer section. New Questions need to be asked separately. This post will be moved to a comment.

ADD REPLYlink written 14 months ago by Istvan Albert ♦♦ 78k
2
gravatar for biomaster
2.6 years ago by
biomaster180
San Jose
biomaster180 wrote:

Hey bro, I know you were doing Ads for your Github project, but your codes did save my day! Your tool helps me to get rid of the damn polyG errors of NextSeq 500 data!

Thanks man, good project!

ADD COMMENTlink written 2.6 years ago by biomaster180
2

wow, glad to know that AfterQC helps.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by chen1.7k
0
gravatar for bioinfo8
19 months ago by
bioinfo8100
bioinfo8100 wrote:

Hi Chen,

'AfterQC' seems to be a wonderful tool which I would like to use for my data. I was wondering if there is any way to use it inside R?

Thanks!

ADD COMMENTlink written 19 months ago by bioinfo8100
1

No R implementation yet.

But using it with Python or Pypy is very simple, you can get started in less than 3 minutes.

ADD REPLYlink written 19 months ago by chen1.7k

Ok, thanks.How much RAM would you recommend to run AfterQC on paired reads of one sample each of ~ 6GB (R1 ~6GB and R2 ~6GB)?

ADD REPLYlink modified 19 months ago • written 19 months ago by bioinfo8100
1

Actually there is no RAM requirement for AfterQC.

AfterQC uses very few RAM, 4GB RAM is quite enough.

ADD REPLYlink written 19 months ago by chen1.7k

This is for one sample?

ADD REPLYlink written 19 months ago by bioinfo8100
1

I meant a 4GB systemm is enough to run AfterQC.

If you want to run too many samples concurrently, a bit more memory may be required.

For example, a 16 GB system should be good with running 20 samples concurrently.

ADD REPLYlink written 19 months ago by chen1.7k
0
gravatar for chen
17 months ago by
chen1.7k
OpenGene
chen1.7k wrote:

AfterQC v0.9.4 was just released, now by using PyPy, it is 3X faster than previous versions.

ADD COMMENTlink modified 17 months ago • written 17 months ago by chen1.7k
0
gravatar for zhimenggan
11 months ago by
zhimenggan0
zhimenggan0 wrote:

for AfterQC, how to run it in batch mode with multiprocess support

ADD COMMENTlink written 11 months ago by zhimenggan0

AfterQC doesn't support multi-threading since it's in Python, you can use another tool I developed, which is much faster and more powerful -- fastp

ADD REPLYlink modified 11 months ago • written 11 months ago by chen1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1133 users visited in the last hour