Question

Estimated Insert Size Less Than Read Pair Sizes (180Bp For 100X100 Pe), But Seqprep Merged 0 Reads?

1

Entering edit mode

12.0 years ago

Ian Fiddes ▴ 70

I have some RNAseq that was done by a one-stop shop type company, aka RNA was submitted and then analyzed data as well as fastq files were returned. I am trying to run tophat/cufflinks analysis myself to compare. I used Bowtie Picard tools to empirically determine the insert size (following http://vinaykmittal.blogspot.com/2012/02/how-to-estimate-insert-size-for-paired.html) and the result was 180bp mean for both samples. I used a basic mrna.fa file from UCSC as my reference for bowtie.

However, the sequencing is 100x100 paired end, so I figured for some strange reason they did overlapping reads. So I ran the program SeqPrep to try and merge the reads, and it came up with exactly zero mergeable pairs. Does anyone know why this would be? The histogram generated by CollectInsertSizeMetrics.jar can be seen here: http://i.imgur.com/UxjoE.png

rna-seq • 3.6k views

ADD COMMENT • link updated 11.7 years ago by Rm 8.3k • written 12.0 years ago by Ian Fiddes ▴ 70

score 0 · Answer 1 · 2012-08-09

Overlapping RNA-seq reads is pretty common: How are the QC's for these reads? If very bad, you wont expect many overlaps: but i am surprised to see zero merged reads?:

One test I suggest: Take a 100,000 paired reads and convert them to fasta and then using megablast search read1 sequences against read2 database...see the distribution of percentage of over laps?