Abyss path subprograms
1
0
Entering edit mode
7.0 years ago

How is the k-mer length used after the initial unitig step in Abyss? Isn't the k-mer length only really relevant in the debruijn unitig assembly step? I want to input unitigs that I generated from SGA and use them for the downstream Abyss steps (AdjList, filtergraph, MergeContigs, etc).

Specifically, I want to use the path manipulation subprograms included with Abyss (PathConsensus, PathOverlap, MergePath, etc). These subprograms all seem to require a k-mer length as an input parameter? Is this just a common interface for all subprograms or are the k-mer length values being used?

**edit

Can I just input a SGA .asqg file into the down-stream programs? Will it recognize the format automatically from file extension?

Abyss • 1.4k views
ADD COMMENT
2
Entering edit mode
7.0 years ago
benv ▴ 730

That is a good question, Damian. (I have sometimes wondered about that myself.)

One thing I can tell you that k is indeed used by the downstream ABySS programs. For example, AdjList finds end-to-end sequence overlaps of k-1 or lower to build the compacted de Bruijn graph. In general, k does provide some useful information to the ABySS algorithms, namely: (i) what is the shortest sequence length in the assembly (k) and (ii) what is the maximum overlap between sequences (k-1).

In your case, I think you can set k to the longest perfect overlap that you want ABySS to find between the ends of the sequences. (And keep in mind that k must be less than equal to the shortest sequence length in the assembly.)

ABySS won't be able to read .asqg, but you may be able to either: (ii) convert the .asqg to a pair of FASTA and Graphviz files, or (ii) convert the .asqg to FASTA and build the Graphviz file from that FASTA using AdjList.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. So the k parameter essentially just let's you specify how long of an overlap you want to consider in building the graph. In the case of AdjList, I guess I'll just set it the same as the -m parameter.

So far all the down-stream programs I've tried are working with the SGA unitig fasta file. However, the coverage filtering parameters are obviously not going to work since it looks like abyss unitigs has coverage information in the headers (sequence id, length, coverage).

ADD REPLY
0
Entering edit mode

Yep, as far as I understand it is basically an overlap parameter. And setting -k == -m would probably work.

Good point about the coverage info. As far as I know, the ABySS tools don't strictly require the coverage values to be present in the FASTA headers, so you should still be able to get the tools to work without them. For example, in the new Bloom filter assembly mode ("ABySS 2.0"), the pre-unitig assembler (abyss-bloom-dbg) doesn't output any coverage values in the FASTA headers, but the rest of the abyss-pe pipeline still works fine.

But yeah, the coverage-related options probably won't work.

ADD REPLY

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6