Sequel II large data set Falcon parameters query
0
0
Entering edit mode
4.8 years ago
rob234king ▴ 610

I have a genome expected to be ~400Mbp-800Mbp (I think 800 is with both haplotypes represented) and I have approx 350x pacbio coverage using new sequel II system. I have used Flye and have a reasonable assembly of 4000 contigs with 22Mbp largest scaffold with one gap but I should do better with Falcon, at least experience in past but I have only ever had 40x coverage so new situation for me. If I throw away everything below 20000 bp I still have 89x coverage at a 800Mbp genome size! jobs taking 4 days to run, and using a lot of resources so I don't want to spend weeks testing parameters.

If I use Falcon what settings are best for this? Below are my settings, I'm not sure about max coverage and max difference as I am likely using 130-260X coverage as my fasta excluding below 7000bp is 114Gbp file.

#### Data Partitioning
pa_DBsplit_option=-x500 -s400
ovlp_DBsplit_option=-s400

#### Repeat Masking
pa_HPCTANmask_option=
#no-op repmask param set
pa_REPmask_code=0,300;0,300;0,300

####Pre-assembly
length_cutoff=7000    
pa_HPCdaligner_option=-v -B128 -M24
pa_daligner_option= -k18 -e0.75 -l1200 -h256 -w8 -s100
falcon_sense_option=--output-multi --min-idt 0.70 --min-cov 20 --max-n-read 350
falcon_sense_greedy=False

####Pread overlapping
ovlp_HPCdaligner_option=-v -B128 -M24 
ovlp_daligner_option=-k24 -e.92 -l1800 -h600 -s100

####Final Assembly
length_cutoff_pr=7000
overlap_filtering_setting=--max-diff 100 --max-cov 350 --min-cov 20
fc_ovlp_to_graph_option=

Any suggestions on parameters, also if exclude below a certain threshold, is there also some kind of duplicates issue like in illumina or some other data issue with pacbio that I need to consider to remove unwanted data. I could just sub-sample but its nice to have a good coverage.

pacbio Falcon • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 1237 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6