Question

Does Abyss work on single-end Illumina reads only?

0

Entering edit mode

4.5 years ago

DNAngel ▴ 250

Hello all,

I cannot find the abyss wiki anymore (it is gone) and the github repository says that abyss does have the "se" command for single-end reads. However, the only example I can find where "se" is used for some reason is used in conjunction with multiple libraries (https://github.com/bcgsc/abyss).

I want to know if I can use abyss just to assemble my single-end reads (100bp) only. If I have cleaned/trimmed fastq files for different species, can I just use:

abyss-pe k=50 se="d_r1.fastq"

I would run this command for all my unique species one by one?

I honestly cannot find a clear example of abyss commands for one set of fastq files so I am asking the community!

Abyss • 1.5k views

ADD COMMENT • link updated 4.5 years ago by Charles Warden 8.2k • written 4.5 years ago by DNAngel ▴ 250

1

Entering edit mode

Abyss manual can be found on https://github.com/bcgsc/abyss#readme

If you're trying to assemble a microbial genome then also try SPAdes as it can run multiple k-mer based assembly in a single run and may provide better assemblies than Abyss.

ADD REPLY • link 4.5 years ago by Sej Modha 5.3k

0

Entering edit mode

I am not working on microbial genomes, vertebrate exome data is what I have. I've asked a lot of questions and it seems that trying to assemble exome data is difficult becuase of missing intergenic regions. Unfortunately a lot of the papers I read in my field have used WGS but the data I HAVE to work with was produced by another lab where they did WES, and I can work with it easily for obtaining reads for single gene sequences, but I really wanted to figure out a way to assemble everything so I can extract all coding regions. Did not expect it to be so tricky. I have tried de novo and have only obtained n50 values of about 300 (maximum is about 4000 which I guess makes sense for exon lengths), however, I am not getting nearly enough contigs. When I blast my contigs, I get only 300 hits which is ridiculously low. Unless I blasted it wrong somehow..but I don't think so because I see individual genes/exons come up but 300 only? that's a joke!

ADD REPLY • link 4.5 years ago by DNAngel ▴ 250

1

Entering edit mode

DNAngel : While the manual refers to using orphaned mates as single-end input with se option, I don't think abyss is intended to be an aligner for single-end data alone. You could try running it as you note above but the results may not be optimal even if the program runs.

ADD REPLY • link 4.5 years ago by GenoMax 141k

0

Entering edit mode

Thank you for this response - I apologize that I misunderstood the question.

ADD REPLY • link 4.5 years ago by Charles Warden 8.2k

score 0 · Answer 1 · 2019-10-08

0

Entering edit mode

4.5 years ago

Charles Warden 8.2k

Your command is for abyss-pe, which is specifically intended for paired end samples.

Your command for running a paried-end assembly should look something like this:

abyss-pe name=$NAME k=96 in='$R1 $R2'

As Sej points out, you can see a similar example in the manual:

https://github.com/bcgsc/abyss#assembling-a-paired-end-library

ADD COMMENT • link 4.5 years ago by Charles Warden 8.2k

1

Entering edit mode

Ah okay I thought -se would be for single-end as well but it does say it requires a short and a long read assembly which I do not have. That's unfortuante!

ADD REPLY • link 4.5 years ago by DNAngel ▴ 250

0

Entering edit mode

Thank you for the follow-up.

I guess I usually think of polishing algorithms (do long-read assembly first, and then sort read). There are hybrid assemblies, but I thought it was usually still better to have longer and shorter PacBio fragments (so, PacBio CCS is better complement than Illumina short read).

However, it sounds like you have Illumina single-end data.

In that case, I think you can still use SSAKE:

http://www.bcgsc.ca/platform/bioinfo/software/ssake

While I think the SSAKE paired-end assembly was a little better than the SSAKE single-end assembly, I think the combination of SSAKE+Staden was comparable to ABySS (with my limited testing), but it requires a lot more hands-on analysis and code modification:

http://genomics-pubs.princeton.edu/prv/scripts.shtml

ADD REPLY • link 4.5 years ago by Charles Warden 8.2k