Hi,
I have a weird situation. I am trying to scaffold a genome assembly using few Illumina mate pair libraries. The original assembly scaffold N50 is 303KB but after scaffolding using SSPACE the N50 reduces close to 160KB. I dont understand why this is happening? What confuses me is that with every mate pair library the scaffolder outputs some good numbers under Satisfied in distance/logic within a given contig pair (pre-scaffold). Here is the whole scaffolding run log:
READING READS LIB20870:
------------------------------------------------------------
Total inserted pairs = 10654771
------------------------------------------------------------
READING READS LIB20871:
------------------------------------------------------------
 Total inserted pairs = 13697194
------------------------------------------------------------
READING READS LIB20872:
------------------------------------------------------------
Total inserted pairs = 12879817
 ------------------------------------------------------------
READING READS LIB20873:
------------------------------------------------------------
Total inserted pairs = 15300189
------------------------------------------------------------
READING READS LIB20874:
------------------------------------------------------------
Total inserted pairs = 14841054
------------------------------------------------------------
 LIBRARY LIB20870 STATS:
 ################################################################################
 MAPPING READS TO CONTIGS:
 ------------------------------------------------------------
    Number of single reads found on contigs = 9753025
    Number of read-pairs used for pairing contigs / total pairs = 3549531 / 3566951
 ------------------------------------------------------------
  READ PAIRS STATS:
    Assembled pairs: 3549531 (7099062 sequences)
            Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 4480 +/-896): 10
            Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 10729
            Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 64997
            ---
            Satisfied in distance/logic within a given contig pair (pre-scaffold): 3400632
            Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 73163
  Total satisfied: 3400642        unsatisfied: 148889
    Estimated insert size statistics (based on 10 pairs):
            Mean insert size = 4542
            Median insert size = 4495
    REPEATS:
    Number of repeated edges = 290
    ------------------------------------------------------------
   ################################################################################
   LIBRARY LIB20871 STATS:
  ################################################################################
  MAPPING READS TO CONTIGS:
  ------------------------------------------------------------
    Number of single reads found on contigs = 12290589
    Number of read-pairs used for pairing contigs / total pairs = 4586970 / 4601094
  ------------------------------------------------------------
  READ PAIRS STATS:
    Assembled pairs: 4586970 (9173940 sequences)
            Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 11311 +/-2262.2): 1542
            Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 27522
            Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 283593
            ---
            Satisfied in distance/logic within a given contig pair (pre-scaffold): 3007045
            Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 1267268
            ---
    Total satisfied: 3008587        unsatisfied: 1578383
    Estimated insert size statistics (based on 1542 pairs):
            Mean insert size = 11334
            Median insert size = 12330
    REPEATS:
    Number of repeated edges = 1014
    ------------------------------------------------------------
    ################################################################################
    LIBRARY LIB20872 STATS:
    ################################################################################
   MAPPING READS TO CONTIGS:
   ------------------------------------------------------------
    Number of single reads found on contigs = 10180550
    Number of read-pairs used for pairing contigs / total pairs = 3716099 / 3727919
    ------------------------------------------------------------
   READ PAIRS STATS:
    Assembled pairs: 3716099 (7432198 sequences)
            Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 10278 +/-2055.6): 7460
            Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 33861
            Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 361430
            ---
            Satisfied in distance/logic within a given contig pair (pre-scaffold): 2322109
            Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 991239
            ---
    Total satisfied: 2329569        unsatisfied: 1386530
    Estimated insert size statistics (based on 7460 pairs):
            Mean insert size = 10576
            Median insert size = 10798
   REPEATS:
    Number of repeated edges = 1051
    ------------------------------------------------------------
    ################################################################################
    LIBRARY LIB20873 STATS:
    ################################################################################
    MAPPING READS TO CONTIGS:
    ------------------------------------------------------------
    Number of single reads found on contigs = 10697666
    Number of read-pairs used for pairing contigs / total pairs = 3877539 / 3888155
    ------------------------------------------------------------
    READ PAIRS STATS:
    Assembled pairs: 3877539 (7755078 sequences)
   Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 9012 +/-1802.4): 9902
            Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 37008
            Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 498990
            ---
            Satisfied in distance/logic within a given contig pair (pre-scaffold): 2340096
            Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 991543
            ---
    Total satisfied: 2349998        unsatisfied: 1527541
    Estimated insert size statistics (based on 9902 pairs):
            Mean insert size = 9690
            Median insert size = 9848
    REPEATS:
    Number of repeated edges = 1331
    ------------------------------------------------------------
   ################################################################################
   LIBRARY LIB20874 STATS:
   ################################################################################
   MAPPING READS TO CONTIGS:
   ------------------------------------------------------------
    Number of single reads found on contigs = 9151596
    Number of read-pairs used for pairing contigs / total pairs = 3228267 / 3237339
   ------------------------------------------------------------
   READ PAIRS STATS:
    Assembled pairs: 3228267 (6456534 sequences)
            Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 7179 +/-1435.8): 2610
            Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 37925
            Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 470137
            ---
            Satisfied in distance/logic within a given contig pair (pre-scaffold): 1645785
            Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 1071810
            ---
    Total satisfied: 1648395        unsatisfied: 1579872
    Estimated insert size statistics (based on 2610 pairs):
            Mean insert size = 7843
            Median insert size = 8037
    REPEATS:
    Number of repeated edges = 1382
   ------------------------------------------------------------
   ################################################################################
  SUMMARY:
  ------------------------------------------------------------
    Inserted contig file;
            Total number of contigs = 29194
            Sum (bp) = 235939786
                    Total number of N's = 96400
                    Sum (bp) no N's = 235843386
            GC Content = 39.77%
            Max contig size = 4482245
            Min contig size = 1000
            Average contig size = 8081
            N25 = 783140
            N50 = 303978
            N75 = 11149
    After scaffolding LIB20870:
            Total number of scaffolds = 28752
            Sum (bp) = 237056737
                    Total number of N's = 1213351
                    Sum (bp) no N's = 235843386
            GC Content = 39.77%
            Max scaffold size = 4482245
            Min scaffold size = 1000
            Average scaffold size = 8244
            N25 = 782168
            N50 = 301219
            N75 = 11151
    After scaffolding LIB20871:
            Total number of scaffolds = 26913
            Sum (bp) = 249614734
                    Total number of N's = 13771348
                    Sum (bp) no N's = 235843386
            GC Content = 39.77%
            Max scaffold size = 4482245
            Min scaffold size = 1000
            Average scaffold size = 9274
            N25 = 747146
     N50 = 263694
            N75 = 14044
    After scaffolding LIB20872:
            Total number of scaffolds = 25035
            Sum (bp) = 260210927
                    Total number of N's = 24367541
                    Sum (bp) no N's = 235843386
            GC Content = 39.77%
            Max scaffold size = 4482245
            Min scaffold size = 1000
            Average scaffold size = 10393
            N25 = 727794
            N50 = 218291
            N75 = 15269
    After scaffolding LIB20873:
            Total number of scaffolds = 22574
            Sum (bp) = 273433297
                    Total number of N's = 37590309
                    Sum (bp) no N's = 235842988
            GC Content = 39.77%
            Max scaffold size = 4482245
            Min scaffold size = 1000
            Average scaffold size = 12112
            N25 = 668175
            N50 = 181977
            N75 = 20104
    After scaffolding LIB20874:
            Total number of scaffolds = 20622
            Sum (bp) = 281665096
                    Total number of N's = 45822307
                    Sum (bp) no N's = 235842789
            GC Content = 39.77%
            Max scaffold size = 4482245
            Min scaffold size = 1000
            Average scaffold size = 13658
            N25 = 650995
            N50 = 160160
            N75 = 22056
  ------------------------------------------------------------
Does any one have a better understanding? Many thanks
It seems to have increased you genome size by 50 Mb, most of which are just NNNN sequences. Because the genome size is now larger, N50 changes. Are you sure you estimated the library sizes correctly? Try redundans, it is a pipeline that uses SSPACE, and it configures it automatically for you.
Thanks apelin20. I will try what you suggested.