Interpreting abyss output
2.3 years ago

I just completed my first genome assembly using abyss (made from 1 set of paired-end files, 1 single end file, and 7 mate pair sets). Abyss creates a large number of output files, and what I'd like to do now is blast my assembly to remove contigs that are potentially there due to contamination. Nevertheless, I am confused about what .fa file represents my final assembly (as there are 13 different .fa files (see below)). What do the numbered files represent (bowfin-1.fa, bowfin-2.fa, etc.)? Will I need to combine these for a final assembly (assuming I want to eventually map reads to the assembly from different individuals and get a .vcf file for pop gen analysis)? I'm just a little confused as to why there are so many output files...          coverage.hist
bowfin-1.fa      bowfin-6.fa 
bowfin-1.path    bowfin-6.path         mpc-6.hist
bowfin-2.dot1          mpd-6.hist
bowfin-2.fa      bowfin-7.fa 
bowfin-2.path    bowfin-7.path         mpe-6.hist
bowfin-3.dist     bowfin-8.fa           mpf-6.hist
bowfin-3.fa      bowfin-bubbles.fa
bowfin-3.fa.fai    mpg-6.hist     bowfin-contigs.fa
bowfin-4.fa      bowfin-indel.fa       mph-6.hist
bowfin-4.path1   bowfin-scaffolds.fa   mpi-6.hist
bowfin-4.path2   bowfin-stats          pea-3.dist
bowfin-4.path3   bowfin-stats.csv      pea-3.hist       peb-3.dist
bowfin-5.fa      peb-3.hist
bowfin-5.path    bowfin-unitigs.fa     slurm.rhea-07.751256.out
2.3 years ago

the one with the highest number appended to your run-name is the final output

bowfin-8.fa in your case (from what I can see) , in every step of abyss it takes the previous result and increases the number by one , so the highest numbered one is the most advanced one

Unless for specific purposes you can ignore all other non .fa ones

there are so many output files because they represent distinct and different analysis steps

Thanks, a fantastic answer to my question!


