AfterQC

Question

Tool:AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

12

Entering edit mode

9.5 years ago

chen ★ 2.5k

Hi, this tool may save your time, it do filtering and QC with fastq data automatically

following introduction is out of date and the newer AfterQC is much more powerful, please check the github page for update

AfterQC

project on github: https://github.com/OpenGene/AfterQC
sample report: http://opengene.org/AfterQC/report.html

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.
Currently it supports processing data from HiSeq 2000/2500/3000/4000, X10, X5, Nextseq 500/550, MiniSeq...

Features:

AfterQC does following tasks automatically:

Filters reads with too low quality, too short length or too many N
Filters reads with abnormal PolyA/PolyT/PolyC/PolyG sequences
Does per-base quality control and plots the figures
Trims reads at front and tail, according to QC results
For pair-end sequencing data, AfterQC automatically corrects low quality wrong bases in overlapped area of read1/read2
Detects and eliminates bubble artifact caused by sequencer due to fluid dynamics issues
Single molecule barcode sequencing support: if all reads have a single molecule barcode (see duplex sequencing), AfterQC shifts the barcodes from the reads to the fastq query names
Support both single-end sequencing and pair-end sequencing data

Dependency:

AfterQC uses editdistance module, run following before using AfterQC:

pip install editdistance

WARNING: If you haven't installed editdistance module, AfterQC will use a python implementation of editdistance, but it will be extremely slow.

Simple usage:

1, Prepare your fastq files in a folder
2, For single-end sequencing, the filenames in the folder should be *R1*
For pair-end sequencing, the filenames in the folder should be *R1* and *R2*

cd /path/to/fastq/folder
python path/to/AfterQC/after.py

Two folders will be automatically generated, a folder 'good' stores the good reads and a folder 'bad' stores the bad reads
AfterQC will print some statistical information after it is done, such how many good reads, how many bad reads, and how many reads are corrected.

Quality Control only

If you only want to get quality control statistics, run:

python after.py --qc_only

Understand the report

AfterQC will generate a QC folder, which contains lots of figures.
For pair-end sequencing data, both read1 and read2 figures will be in the same folder with the folder name of read1's filename. R1 means read1, R2 means read2.
For single-end sequencing data, it will still have R1.
prefilter means before filtering, postfilter means after filtering
For pair-end sequencing data, After will do an overlap analysis. read1 and read2 will be overlapped when read1_length + read2_length > DNA_template_length.

PolyG Quality-Control Filtering Fastq AfterQC • 9.5k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.5 years ago by chen ★ 2.5k

0

Entering edit mode

Hello,

I've got a few questions about the calcs in AfterQC. In the AfterQC paper, you note that "AfterQC can detect the mismatches in the overlapping regions. For those reads with very long overlap (i.e. overlap_len>50)".

In the estimated seq error field in the html report, are only overlaps greater than 50bp considered? And are the errors in these overlaps the only component that goes into the seq error rate calculation?

If only overlaps greater than 50bp go into the calculation, could you please let me know where should I change the source to modify that number (my guess is complete_compare_require in util.py)?

Thanks very much for the software!

ADD REPLY • link 8.1 years ago by atcg ▴ 10

0

Entering edit mode

Please don't post new questions in the answer section. New Questions need to be asked separately. This post will be moved to a comment.

ADD REPLY • link 8.1 years ago by Istvan Albert 103k

score 2 · Answer 1 · 2016-05-07

2

Entering edit mode

9.5 years ago

biomaster ▴ 180

Hey bro, I know you were doing Ads for your Github project, but your codes did save my day! Your tool helps me to get rid of the damn polyG errors of NextSeq 500 data!

Thanks man, good project!

ADD COMMENT • link 9.5 years ago by biomaster ▴ 180

2

Entering edit mode

wow, glad to know that AfterQC helps.

ADD REPLY • link 9.5 years ago by chen ★ 2.5k

score 0 · Answer 2 · 2017-05-16

0

Entering edit mode

8.5 years ago

bioinfo8 ▴ 230

Hi Chen,

'AfterQC' seems to be a wonderful tool which I would like to use for my data. I was wondering if there is any way to use it inside R?

Thanks!

ADD COMMENT • link 8.5 years ago by bioinfo8 ▴ 230

1

Entering edit mode

No R implementation yet.

But using it with Python or Pypy is very simple, you can get started in less than 3 minutes.

ADD REPLY • link 8.5 years ago by chen ★ 2.5k

0

Entering edit mode

Ok, thanks.How much RAM would you recommend to run AfterQC on paired reads of one sample each of ~ 6GB (R1 ~6GB and R2 ~6GB)?

ADD REPLY • link 8.5 years ago by bioinfo8 ▴ 230

1

Entering edit mode

Actually there is no RAM requirement for AfterQC.

AfterQC uses very few RAM, 4GB RAM is quite enough.

ADD REPLY • link 8.5 years ago by chen ★ 2.5k

0

Entering edit mode

This is for one sample?

ADD REPLY • link 8.5 years ago by bioinfo8 ▴ 230

1

Entering edit mode

I meant a 4GB systemm is enough to run AfterQC.

If you want to run too many samples concurrently, a bit more memory may be required.

For example, a 16 GB system should be good with running 20 samples concurrently.

ADD REPLY • link 8.5 years ago by chen ★ 2.5k

0

Entering edit mode

I am trying to run afterQC as i saw here it needs only 4 gb RAM. but I am taking memory failed error. (I am using 250 gb SSD and also 1 tb HDD and RAM 8gb)

ADD REPLY • link 6.8 years ago by MS ▴ 40

score 0 · Answer 3 · 2017-07-18

0

Entering edit mode

8.3 years ago

chen ★ 2.5k

AfterQC v0.9.4 was just released, now by using PyPy, it is 3X faster than previous versions.

ADD COMMENT • link 8.3 years ago by chen ★ 2.5k

score 0 · Answer 4 · 2018-01-08

0

Entering edit mode

7.8 years ago

zhimenggan • 0

for AfterQC, how to run it in batch mode with multiprocess support

ADD COMMENT • link 7.8 years ago by zhimenggan • 0

0

Entering edit mode

AfterQC doesn't support multi-threading since it's in Python, you can use another tool I developed, which is much faster and more powerful -- fastp

ADD REPLY • link 7.8 years ago by chen ★ 2.5k