PacBio raw data trimming and cleaning
0
0
Entering edit mode
6.8 years ago
misterie ▴ 110

Hi,

I have done analysis of QC using FastQC for my PacBio data. I am wondering whether should I clean my data by quality criterion or minimum length of read. Those data are very poor (PacBio). Do you know any recommendation how to clean those data? For Illumina I used to use Trimmomatic, CutAdapt and TrimGalore, but I have no idea how to pre-process PacBio data. I think I should remove reads shorter than 50 bp, but if you have any other recommendation for those purposes, let me know.

pacbio trimming cleaning qc • 6.0k views
ADD COMMENT
0
Entering edit mode

Things like this usually depend on the application you want to do after cleaning. Do you want (structural) variant calling, de novo assembly,...?

ADD REPLY
0
Entering edit mode

I want to do de novo assembly using different pipelines. But I think I should at least trimm my data using minimum length =50bp

ADD REPLY
0
Entering edit mode

Since you are working with PacBio data I personally think it's a bit silly to use a lower bound of only 50 nucleotides. Depending on your read length distribution I would go for at least 10fold of your 50n threshold.

On the other hand most PacBio processing pipelines will already apply an internal min length filtering (mostly around few Kbp).

ADD REPLY
0
Entering edit mode

Thank you. I mean mainly, that I have some samples after demultiplexing using lima and standard (default) threshold for minimum length was set to 50 Bp. I have also samples that do not require demultiplexing so there are reads that have minimum length = 1bp. I want to uniform those samples. If it could be better to change threshold to 500bp let me know which software will be appropriate.

ADD REPLY
0
Entering edit mode

it all depends on what the analyses are you want to do with the data.

eg. assembly: most assemblers will either do 'cleaning' themselves or do no quality cleaning at all

ADD REPLY
0
Entering edit mode

I'm facing the same problem like you! did you solve it eventually? actually I'm a complete newbie in this field and trying to do de novo assembly on plants. I only have pacbio data and I don't know where to find the pipelines and the right tools to use...

ADD REPLY
0
Entering edit mode

Do you have Pacbio CLR data or CCS (HiFi) data ?

For quality trimming you can try FastP Long.

ADD REPLY
0
Entering edit mode

ok, what have you tried so far?

a quick google search (or alike) should likely already point you to some tools or procedures.

on read QC: for long reads, especially for the goal is assembly, it's often mainly length filtering (and some quality filtering, but that's not even crucial, well at least much less than it is for illumina for instance, same applies for the adapter trimming). In any case you first should get the QC overviews to make the correct decision (something like fastQC or nanoplot/chopper/ ... )

ADD REPLY
0
Entering edit mode

PacBio has several tools available for assembly of their data: https://www.pacb.com/products-and-services/analytical-software/whole-genome-sequencing/

ADD REPLY

Login before adding your answer.

Traffic: 3169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6