Question: Help Interpreting Abyss-pe Character Occurrence Table
1
gravatar for richardzhang97
3.2 years ago by
richardzhang9710 wrote:

While running abyss-pe, I would often get this message printed to stdout.

Building the suffix array... 
Building the Burrows-Wheeler transform... 
Building the character occurrence table... 
Mateless          0 
Unaligned   9840128  12.3%
Singleton  20579451  25.7% 
FR         28709736  35.9% 
RF            20287  0.0253% 
FF           130290  0.163% 
Different  20752726  25.9% 
Total      80032618 
Ambiguous paths: 86 
Merged:          48 
No paths:        0 
Too many paths:  4
Too complex:     1 
Dissimilar:      33

I thought that these represented the percentage of original reads mapped to the scaffold produced by abyss, but the total is greater than the number of original reads. Also, I would appreciate any information or literature regarding the meaning of FR, RF, FF and Singleton.

Thanks a lot!

abyss assembly • 812 views
ADD COMMENTlink modified 3.2 years ago by mastal5112.0k • written 3.2 years ago by richardzhang9710
0
gravatar for mastal511
3.2 years ago by
mastal5112.0k
mastal5112.0k wrote:

Singleton means that only one read of the pair aligned, or that the two reads of the pair didn't align as a pair, that is, within the expected distance of each other. F means forward, and R means reverse, RF, FR and FF refers to the relative orientation of the two reads of a pair in the alignment.

ADD COMMENTlink written 3.2 years ago by mastal5112.0k
1

I agree with most of this but I'm not sure on your explanation for 'singleton'. ABySS uses the aligned pairs to build a distribution of insert sizes so, to me it would not make much sense to already limit the mapping to an expected insert size. I therefore think that singleton is only pointing to the read pairs were only one reads actually aligns to a sequence.

ADD REPLYlink written 3.2 years ago by lieven.sterck5.5k
1

Yes, you are right, Lieven. At this stage in the pipeline, ABySS is looking at read pairs that align to the same contig in order to an estimate of the fragment size distribution. So:

  • Unaligned = both reads in pair unmapped
  • Singleton = only one read in pair mapped to the assembly
  • FR = pairs that mapped to the same contig in the Forward-Reverse orientation
  • RF = pairs that mapped to the same contig in the Reverse-Forward orientation
  • FF = pairs that mapped to the same contig in the Forward-Forward orientation
  • Different = pairs where each read mapped to a different contig (orientation unknown)
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by benv710

Thx for resolving this Ben. One related question: is it correct to assume then that it are mainly the reads from the 'different' category that will contribute to the contig and scaffold building stage?

ADD REPLYlink written 3.2 years ago by lieven.sterck5.5k

Yes, exactly :-) The pairs that map to different unitigs/contigs are the ones that provide the linking information.

ADD REPLYlink written 3.2 years ago by benv710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1131 users visited in the last hour