Hi
I got both Nanopore long reads and Illumina short reads. Following are the commands I used for hybrid assembly. I played around filtlong with min_length 1000 + trimmomatic to trim reads with score less than 20; filtlong --min_length 10000 + trimmomatic to trim reads score less than 20; filtlong --min_length 10000 + trimmomatic to trim score less than 30.
For all the three combinations above, I ran checkm for QC for hybrid assembly results. and got exactly same results with 100% completeness. However, the contig numbers of the hybrid assembly are a little different. Then I blast the hybrid assembly results with one another, they are still a little bit different.
Based on these information, I got confused, how am I going to decide what flags should I use considering they all gave me the same results? Does any have this kind of experiences for bacterium genome assembly?
# remove PhiX sequence from Miseq data
bowtie2 -x ../Kin002/PhiX/PhiX_bowtie_db -q -1 Tul002_S22_L001_R1_001.fastq.gz -2 Tul002_S22_L001_R2_001.fastq.gz -S Tul002.sam --un-conc-gz Tul002.screened.fastq.gz --local -p 10 &&\
# trim low quality reads
trimmomatic PE -phred33 -threads 10 Tul002.screened.fastq.1.gz Tul002.screened.fastq.2.gz Tul002.R1.trimmed.fastq.gz Tul002.R1.unpaired.fastq.gz Tul002.R2.trimmed.fastq.gz Tul002.R2.unpaired.fastq.gz LEADING:3 SLIDINGWINDOW:4:20 MINLEN:50 &&\
# filt nanopore data with filtlong
filtlong --min_length 1000 --keep_percent 90 Tul002.fastq | gzip > Tul002_filtered.fastq.gz &&\
gzip -d -c Tul002_filtered.fastq.gz > Tul002_filtered.fastq &&\
# hybrid assembly
unicycler -1 Tul002.R1.trimmed.fastq.gz -2 Tul002.R2.trimmed.fastq.gz -l Tul002_filtered.fastq -o Tul002_hybrid -t 20 &&\
# QC using CheckM
mkdir bin &&\
cp Tul002_hybrid/assembly.fasta ./bin &&\
checkm lineage_wf -f Tul002_CheckM.txt -t 4 -x fasta ./bin ./bin/SCG2/