Question: How can I get the max overlap parameter / Is is possible to get fragment size from forward / reverse pair's fastq file?
0
gravatar for qhsh9713
3 months ago by
qhsh97130
qhsh97130 wrote:

Hi, I have a question about FLASH parameter.

FLASH has a maximum overlap parameter called -M.

And parameter recommend this value that calculated from the read length, fragment size, fragment length standard deviation.

   -r, --read-len=LEN
   -f, --fragment-len=LEN
   -s, --fragment-len-stddev=LEN
                      Average read length, fragment length, and fragment
                      standard deviation.  These are convenience parameters
                      only, as they are only used for calculating the
                      maximum overlap (--max-overlap) parameter.

                      ***The maximum overlap is calculated as the overlap of
                      average-length reads from an average-size fragment
                      plus 2.5 times the fragment length standard
                      deviation.*** 

                      The default values are -r 100, -f 180,
                      and -s 18, so this works out to a maximum overlap of
                      65 bp.  If --max-overlap is specified, then the
                      specified value overrides the calculated value.

So, I made a python code for calculate read length, average read length, standard deviation read length.

I suddenly found that everything is wrong. Because I can't get fragment size. I just have a forward, reverse fastq file.

          -------------------------------------  <fragment>

          ----------------------->                 <forward read>
                          over_lap
                         <-----------------------  <reverse read>

I confused all about that.

I set the -M parameter(max overlap) Avg(forward read length) + 2.5 * (standard deviation(forward read length).

But I think it is wrong. Because FLASH recommended the value like this.

                     The maximum overlap is calculated as the overlap of
                      average-length reads from an average-size fragment
                      plus 2.5 times the fragment length standard
                      deviation.

I think I should know about fragment size. But I just have forward / reverse fastq file.

what do you think I should do? I really need your advice.

ADD COMMENTlink modified 3 months ago by h.mon9.2k • written 3 months ago by qhsh97130

If you have a reference genome, you can align the reads to the genome to get a bam file, and run CollectInsertSizeMetrics from Picard

ADD REPLYlink written 3 months ago by st.ph.n1.9k
1
gravatar for h.mon
3 months ago by
h.mon9.2k
Brazil
h.mon9.2k wrote:

If your samples were run through BioAnalyser prior to library preparation (could be some other magical machine, I am terrible at wet-lab stuff), it would spit out a fragment length distribution.

Bioinformatically, you could assemble and map the reads against the assembly to estimate insert (fragment) size, see tadpole and bbmap documentations, they are fast programs to perform these tasks. Hint, use parameter ihist with bbmap to get insert size distribution.

Finally, I would use bbmerge instead of flash to perform pair merging. It also outputs an estimate of fragment size and its standard deviation (again, use parameter ihist) - and do not need this information a priori.

ADD COMMENTlink modified 3 months ago • written 3 months ago by h.mon9.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1455 users visited in the last hour