I'm getting a bit confused after having run an aligner program on my RNA-seq data (BWA for Burrows-Wheeler Aligner) in order to get an estimate of the mate inner distance for use in downstream analysis. The total RNA libraries were prepared with universal Illumina adapters, it was a 150 bp PE sequencing. BWA gave me an average insert size of ~250 +/- 60 bp, and the sequencing company gave me a target fragment size value of 394. Adapters are 34 bp long.
Here is how I understand things so far, please correct me if I'm wrong.
1) Does the fragment size target value of 394 includes the 3' and 5' adapters? I would think so.
2) Do the read length include adpaters? I would say no, because quality checks on the clean data (ie adapters trimmed) give a read length of exactly 150 bp (shouldn't it be 150 - 2x34 = 92 bp after the adapters are removed if they are 34 bp long and included in the read length?)
3) Given the two previous points, that would give an insert size of 394 - 2x34 = 326. Which is different from the BWA estimate. Is it usual to have a large difference between a target fragment size and the actual size of the fragments that are sequenced? And if I have an insert size of ~ 250 bp and PE reads of 150 bp, that means that the left and right reads overlap in the middle?
Thanks for your help! Antoine