Question: Does Abyss work on single-end Illumina reads only?
0
gravatar for DNAngel
5 weeks ago by
DNAngel40
DNAngel40 wrote:

Hello all,

I cannot find the abyss wiki anymore (it is gone) and the github repository says that abyss does have the "se" command for single-end reads. However, the only example I can find where "se" is used for some reason is used in conjunction with multiple libraries (https://github.com/bcgsc/abyss).

I want to know if I can use abyss just to assemble my single-end reads (100bp) only. If I have cleaned/trimmed fastq files for different species, can I just use:

abyss-pe k=50 se="d_r1.fastq"

I would run this command for all my unique species one by one?

I honestly cannot find a clear example of abyss commands for one set of fastq files so I am asking the community!

abyss • 169 views
ADD COMMENTlink modified 5 weeks ago by Charles Warden7.3k • written 5 weeks ago by DNAngel40
1

Abyss manual can be found on https://github.com/bcgsc/abyss#readme

If you're trying to assemble a microbial genome then also try SPAdes as it can run multiple k-mer based assembly in a single run and may provide better assemblies than Abyss.

ADD REPLYlink written 5 weeks ago by Sej Modha4.5k

I am not working on microbial genomes, vertebrate exome data is what I have. I've asked a lot of questions and it seems that trying to assemble exome data is difficult becuase of missing intergenic regions. Unfortunately a lot of the papers I read in my field have used WGS but the data I HAVE to work with was produced by another lab where they did WES, and I can work with it easily for obtaining reads for single gene sequences, but I really wanted to figure out a way to assemble everything so I can extract all coding regions. Did not expect it to be so tricky. I have tried de novo and have only obtained n50 values of about 300 (maximum is about 4000 which I guess makes sense for exon lengths), however, I am not getting nearly enough contigs. When I blast my contigs, I get only 300 hits which is ridiculously low. Unless I blasted it wrong somehow..but I don't think so because I see individual genes/exons come up but 300 only? that's a joke!

ADD REPLYlink written 4 weeks ago by DNAngel40
1

DNAngel : While the manual refers to using orphaned mates as single-end input with se option, I don't think abyss is intended to be an aligner for single-end data alone. You could try running it as you note above but the results may not be optimal even if the program runs.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax74k

Thank you for this response - I apologize that I misunderstood the question.

ADD REPLYlink written 4 weeks ago by Charles Warden7.3k
0
gravatar for Charles Warden
5 weeks ago by
Charles Warden7.3k
Duarte, CA
Charles Warden7.3k wrote:

Your command is for abyss-pe, which is specifically intended for paired end samples.

Your command for running a paried-end assembly should look something like this:

abyss-pe name=$NAME k=96 in='$R1 $R2'

As Sej points out, you can see a similar example in the manual:

https://github.com/bcgsc/abyss#assembling-a-paired-end-library

ADD COMMENTlink modified 4 weeks ago • written 5 weeks ago by Charles Warden7.3k
1

Ah okay I thought -se would be for single-end as well but it does say it requires a short and a long read assembly which I do not have. That's unfortuante!

ADD REPLYlink written 4 weeks ago by DNAngel40

Thank you for the follow-up.

I guess I usually think of polishing algorithms (do long-read assembly first, and then sort read). There are hybrid assemblies, but I thought it was usually still better to have longer and shorter PacBio fragments (so, PacBio CCS is better complement than Illumina short read).

However, it sounds like you have Illumina single-end data.

In that case, I think you can still use SSAKE:

http://www.bcgsc.ca/platform/bioinfo/software/ssake

While I think the SSAKE paired-end assembly was a little better than the SSAKE single-end assembly, I think the combination of SSAKE+Staden was comparable to ABySS (with my limited testing), but it requires a lot more hands-on analysis and code modification:

http://genomics-pubs.princeton.edu/prv/scripts.shtml

ADD REPLYlink written 4 weeks ago by Charles Warden7.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1966 users visited in the last hour