Help filling the configuration file for the masurca assembler
2.5 years ago
nanoide

Hi, So I would like to try the MaSuRCA assembler using Illumina short paired-end reads and pacbio long reads.

1.- I saw in the configuration file that one has to specify the fragment length average and standard deviation for Illumina I used the code I found here on the fastq file with the forward and reverse reads and got as result:

For _1.fastq reads: total 861059 avg=226.837865 stddev=228.847929

For _2.fastq reads: total 883019 avg=229.049172 stddev=231.013502

So, I was going to put in the configuration file for masurca, when it asks for <fragment mean=""> and <fragment stdev="">, 227 and 343. Would that be correct? Are these parameters critical for the final output? I got confused because if I run the fastqc software on the fastq files, I get as sequence length 100. How is that possible?

2.- Another line of the configuration file is: Use at most this much coverage by the longest Pacbio or Nanopore reads, discard the rest of the reads LHE_COVERAGE=25

I was told this parameter in previous versions used to be 30, so I would change it. But what does this mean? Does this depend on the coverage or sequencing depth of the library in the first place?

Hope anyone can clarify how to fill this information for the masurca assembly. Thanks

illumina Assembly masurca fastqc pacbio • 1.1k views

