Question

Sequel II large data set Falcon parameters query

0

Entering edit mode

4.8 years ago

rob234king ▴ 610

I have a genome expected to be ~400Mbp-800Mbp (I think 800 is with both haplotypes represented) and I have approx 350x pacbio coverage using new sequel II system. I have used Flye and have a reasonable assembly of 4000 contigs with 22Mbp largest scaffold with one gap but I should do better with Falcon, at least experience in past but I have only ever had 40x coverage so new situation for me. If I throw away everything below 20000 bp I still have 89x coverage at a 800Mbp genome size! jobs taking 4 days to run, and using a lot of resources so I don't want to spend weeks testing parameters.

If I use Falcon what settings are best for this? Below are my settings, I'm not sure about max coverage and max difference as I am likely using 130-260X coverage as my fasta excluding below 7000bp is 114Gbp file.

#### Data Partitioning
pa_DBsplit_option=-x500 -s400
ovlp_DBsplit_option=-s400

#### Repeat Masking
pa_HPCTANmask_option=
#no-op repmask param set
pa_REPmask_code=0,300;0,300;0,300

####Pre-assembly
length_cutoff=7000    
pa_HPCdaligner_option=-v -B128 -M24
pa_daligner_option= -k18 -e0.75 -l1200 -h256 -w8 -s100
falcon_sense_option=--output-multi --min-idt 0.70 --min-cov 20 --max-n-read 350
falcon_sense_greedy=False

####Pread overlapping
ovlp_HPCdaligner_option=-v -B128 -M24 
ovlp_daligner_option=-k24 -e.92 -l1800 -h600 -s100

####Final Assembly
length_cutoff_pr=7000
overlap_filtering_setting=--max-diff 100 --max-cov 350 --min-cov 20
fc_ovlp_to_graph_option=

Any suggestions on parameters, also if exclude below a certain threshold, is there also some kind of duplicates issue like in illumina or some other data issue with pacbio that I need to consider to remove unwanted data. I could just sub-sample but its nice to have a good coverage.

pacbio Falcon • 1.2k views

ADD COMMENT • link 4.8 years ago by rob234king ▴ 610