abyss config/scaffold fasta file headers
Entering edit mode
3.1 years ago
jm440 ▴ 10

Hi all,

Does anyone know what the numbers in the headers of the following abyss output files: novo-contigs.fa or novo-scaffolds.fa mean?

They look like this: >1900 550 15779

I think the middle value (550) is the length of that sequence, but not sure about the 1st or 3rd (my guess is maybe one of these is an ID thats used to link contigs with scaffolds but i'm not sure).

Thank you!

abyss • 1.0k views
Entering edit mode

Just a guess, but the first number is probably just an unique contig / scaffold identifier - these identifiers are used to cross-reference the contigs / scaffolds in other output files from ABySS. In the case of novo-scaffolds.fa, the contig identifiers may appear after the three numbers, see trying to understand abyss headers.

The third number I suspect is kmer coverage of the contig / scaffold.

Although just guesses, the description of the adjacency (adj) format provides support for my guesses:

The first field (e.g. "28 51 3854") provides information about the subject sequence and consist of 3 parts: <seq_id> <seq_len> <kmers>, where SEQ_ID is a unique identifier for the sequence assigned by ABySS, SEQ_LEN is the length of the sequence in bases, and KMERS is the number of KMERS that mapped to the sequence during assembly (i.e. the sum of kmer multiplicities for each kmer in the sequence.)

Entering edit mode

Thank you for this helpful information!


Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6