Question: Get Average Insert Size Of Fastq?
1
gravatar for dan79
5.3 years ago by
dan7990
dan7990 wrote:

Is there a way to do it? Sorry for the uninformative question, so I have downloaded an SRA file from NCBI and used included sratoolkit to split the file into two fastq sequences. I am trying to do a de novo assembly using these paired-end strand_specific reads. However, a required parameter is the average insert size. Does anyone know how to obtain this from an SRA file or fastq?

fastq • 5.5k views
ADD COMMENTlink modified 5.3 years ago by matted6.5k • written 5.3 years ago by dan7990

Please describe your question so people can help you. I think I understand what your asking, but without more information it is difficult to answer.

ADD REPLYlink written 5.3 years ago by Zev.Kronenberg11k

Edited, thanks.

ADD REPLYlink written 5.3 years ago by dan7990

You will need to align the reads (both pairs). Then you can find the insert lengths by parsing the SAM/BAM file.

ADD REPLYlink written 5.3 years ago by Zev.Kronenberg11k

Align the reads to a reference genome? This seems counterintuative considering the whole point of a de novo assembly is to not need a reference.

ADD REPLYlink written 5.3 years ago by dan7990

Good point. Sorry. I need to read more carefully. I don't know the answer. I look forward to seeing the best solution.

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by Zev.Kronenberg11k
4
gravatar for matted
5.3 years ago by
matted6.5k
Boston, United States
matted6.5k wrote:

Guessing an insert size length, assembling, mapping to the assembly, and then iterating with the improved insert size length (from the mappings) is a reasonable choice, and probably about the best you can do. You hopefully should have some rough idea from the library preparation method (size selection criteria or if it's jumping library or not).

In fact, Velvet does this automatically (from the 1.1 manual): "If the insert length of a library is unspecified, Velvet will attempt to measure it for you, based on the read-pairs which happen to map onto a common node." As they point out, it's critical to check the reported estimate to make sure it's sane.

ADD COMMENTlink written 5.3 years ago by matted6.5k

SOAPdenovo gives you an initial insert size estimate as well.

ADD REPLYlink written 5.3 years ago by Vivek1.9k
2
gravatar for dfornika
5.3 years ago by
dfornika950
Vancouver, British Columbia, Canada
dfornika950 wrote:

I'm going to suggest a lazy, imperfect solution. If this is illumina (Genome Analyzer, HiSeq etc.) then th insert size is normally about 300bp. If your assembler isn't too sensitive to that parameter, try 300bp as a reasonable guess.

ADD COMMENTlink written 5.3 years ago by dfornika950

Haha, well its better than nothing. I read that somewhere too, yes the sequencer was an Illumina. I already started the job with 300 insert size. +1

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by dan7990
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 597 users visited in the last hour