Hi, we have RNA-Seq data from Illumina HiSeq 2000. Based on Illumina RNA-seq library protocol that we follow, the range of insert length is 120-210 bp, median insert length: 155 bp.
After doing an alignment with tophat, I got the stats on the insert based on the resulting BAM file.Based on the result, it seems that the median insert size is larger than 155bp. For tumor, the median insert size is 188bp, and for control sample, it is 280bp. The detailed results are as follows,
For tumor sample,
MEDIAN_INSERT_SIZE    MIN_INSERT_SIZE    MAX_INSERT_SIZE    MEAN_INSERT_SIZE    STANDARD_DEVIATION    READ_PAIRS    PAIR_ORIENTATION    WIDTH_OF_10_PERCENT    WIDTH_OF_20_PERCENT    WIDTH_OF_30_PERCENT    WIDTH_OF_40_PERCENT    WIDTH_OF_50_PERCENT    WIDTH_OF_60_PERCENT    WIDTH_OF_70_PERCENT    WIDTH_OF_80_PERCENT    WIDTH_OF_90_PERCENT    WIDTH_OF_99_PERCENT
188    75    227724412    721.727402    1451.554057    21582855    FR    25    49    71    91    113    143    253    1269    4501    41471
For control sample,
MEDIAN_INSERT_SIZE    MIN_INSERT_SIZE    MAX_INSERT_SIZE    MEAN_INSERT_SIZE    STANDARD_DEVIATION    READ_PAIRS    PAIR_ORIENTATION    WIDTH_OF_10_PERCENT    WIDTH_OF_20_PERCENT    WIDTH_OF_30_PERCENT    WIDTH_OF_40_PERCENT    WIDTH_OF_50_PERCENT    WIDTH_OF_60_PERCENT    WIDTH_OF_70_PERCENT    WIDTH_OF_80_PERCENT    WIDTH_OF_90_PERCENT    WIDTH_OF_99_PERCENT
280    74    242555584    394.905075    468.383792    50410660    FR    21    45    79    131    193    245    287    339    1491    14367
3762    74    242542706    3676.159079    492.235063    1560925    RF    11    19    27    41    65    79    101    6921    7299    20051069
So I wondered which insert size value I should use? And do I need to run TopHat again based on the insert size value from Picard? Thank you in advance.
Hi Arun, do you have any reference or paper for above method? Thanks.
Sorry, I just noticed the comment. Yes. Its used in BWA.