Question: Is this mate pair insert size distribution to broad?
0
gravatar for mschmid
3.9 years ago by
mschmid90
Switzerland
mschmid90 wrote:

I got Illumina mate pair data. The insert size distribution is as follows:

http://imgur.com/bhKrEpo

1. What do you think about this distribution in general?

2. This library was done without targeting for a certain insert size length. What is the variation of the insert size if you enrich for a certain size? Do have any source or an example?

3. I would like to use this data together with Illumina PE. For example using spades. We want to assemble Plasmids from 90kb to 150kb. Do you think this library is suitable? Would you target a specific size? What techniques do you use?

 

mate pair spades illumina • 2.0k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by mschmid90

Perhaps the large inserts are actually not that large, but appear so due to linear representation of molecules that have circular topology? For example, Read 1 can be proximate to the 5'-end of a molecule (fasta file) and Read 2 to the 3'-end of a molecule. Then it appears that your insert size spans the whole molecule, when IRL the reads are actually proximate to each other when the molecule is presented in circular form. You could test this easily by extracting the large insert mates, and then mapping them to a fasta file where you have moved a few 10k bp from the 5'-end of the sequence to the 3'-end of the sequence..

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by 5heikki8.4k
0
gravatar for Carlo Yague
3.9 years ago by
Carlo Yague4.6k
Belgium
Carlo Yague4.6k wrote:

EDIT : I wrote this answer for a paired-end library. OP's question concerned mate-pair. My bad.

1- The inserts are MUCH too long. Are you sure the mates are paired correctly ? I had a similar distribution once but it was because I was pairing my reads incorrectly.

2- I have this kind variation with illumina paired-end RNA-seq. :
bioanalyzer

It's best if you can compare experimental results (such as this bioanalyzer profile) with the computation of insert size from your reads.

3- I don't know, I'll let others answer this one :)

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Carlo Yague4.6k

How do you mean much too long? You also have a peak at 10kbp. Your peak is just more narrow (where you targeting this length?). And the broader peak is the PE fraction of your mate pair library I guess?

ADD REPLYlink written 3.9 years ago by mschmid90

The peak at 35 and 10380 bp are the peaks of the markers, unrelevant here. The broader peak represents the sizes of the my cDNA library prior to sequencing (adaptors + insert). Since adaptors are ~120 bp, my inserts are mostly between 80 and 900 bp, which is reasonable in my case (paired-end RNA-seq). But perhaps you have a whole different kind of library.

ADD REPLYlink written 3.9 years ago by Carlo Yague4.6k
2

Mate pair libraries are ment to have much longer inserts than pe libraries..

ADD REPLYlink written 3.9 years ago by 5heikki8.4k

5heikki, thanks I understand that. My question was more if the distribution is what you would expect and if you would try to narrow the distribution for de novo assembly?

ADD REPLYlink written 3.9 years ago by mschmid90

Oh, sorry. I missinterpreted. But you seem to have Paired End. I have Illumina Mate Pair (http://www.illumina.com/documents/products/technotes/technote_nextera_matepair_data_processing.pdf). Right?

ADD REPLYlink written 3.9 years ago by mschmid90

Whoops, my bad !

ADD REPLYlink written 3.9 years ago by Carlo Yague4.6k

While I agree with what you say carlo, in my experience Bioanalyser plots from the library often look different to the insert size distribution of the sequenced reads, for all sorts of reasons.

ADD REPLYlink written 3.9 years ago by John12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour