Question: Recommendations For Heterozygous Genome Assembly Software
7
gravatar for Leszek
5.8 years ago by
Leszek3.9k
IIMCB, Poland
Leszek3.9k wrote:

I'm trying to assemble small (20Mb), diploid fungal genome from MiSeq reads (~400bp after merging, 100x coverage).
The tricky thing: it's heterozygous. The divergence is ~4%, but there are hundreds (thousands?) loss of heterozygosity (LOH) regions, accounting in total to almost half of the genome...
Do you know of any assembler (methodology) capable of handling with such data?

I have tried many assemblers/tricks:

  • de Bruijn graph: Velvet, ABySS, SOAPdenovo
  • overlap-based: MIRA, Newbler
  • clustering reads and then assembling with CAP3

but with rather bad effects. Every time, the assembly was very fragmented (N50 ~10kb), homozygous regions (LOH) were collapsed with ~200x, while heterozygous regions were separated with 100x.

I will be happy to hear about any ideas:)

EDIT

As no satisfactory solution exists, I'm developing Redundans (check below for more info). 

illumina miseq assembly genome • 12k views
ADD COMMENTlink modified 10 months ago by dmathog30 • written 5.8 years ago by Leszek3.9k
5
gravatar for Leszek
3.6 years ago by
Leszek3.9k
IIMCB, Poland
Leszek3.9k wrote:

As there were no good solutions, I've developed a tool that assists assembly of heterozygous/hybrid genomes (Redundans). 

The program takes as input assembled contigspaired-end and/or mate pairs sequencing libraries and returns scaffolded homozygous genome assembly. Resulting assembly should be less fragmented and with total size smaller than the input contigs. In addition, Redundans will automatically close the gaps resulting from genome assembly or scaffolding (more details). 

Hopefully, someone else will benefit from that! I'll be really grateful for your feedback. 

BTW: sorry for being necromancer;) 

ADD COMMENTlink written 3.6 years ago by Leszek3.9k
1

Sounds interesting. Two quick questions:

  1. What is the practical upper limit for genome size to be used with your pipeline
  2. Since your manual already mentions SPAdes, have you looked at how your tool performs in comparison to dipSPAdes?
ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by thackl2.5k
1

1. To be honest, I don't know and I would love to check that. What is your genome of interest? I've been playing mostly with fungal genomes (10-50Mb) and recently with Arabidopsis (119Mb). I would like to try it on larger genomes, but don't have the right data. 

2. Redundans works better than dipSPAdes especially if:

  • you have some recombinations (loss of heterozygosity) in your genome
  • tiny (1-3%) or large (>10%) divergence between heterozygous copies
  • your genomic libraries are with different read lenght (then all de novo assemblers will have problems).

For more details, have a look in the manuscript

ADD REPLYlink written 3.6 years ago by Leszek3.9k

Thanks a lot for the manuscript, it was quite informative. I have two quite different sets: a couple of 30-50 Mb diploid protists with asexual reproduction - quite heterozygous but  w/o recombination, and a ~3Gb heterozygous und quite repetitive plant genome with a very fragmented draft assembly...

For the protists I use overlapping MiSeq 300 bp libraries and (corrected) PacBio reads. I'm pretty happy with the dipSPAdes hybrid assemblies, but I'd like to see what Redundans can do. Although, given my data I'll probably only be able to run reduction and than use sspace-longread/pbJelly instead the sspace/gapcloser combi for scaffolding.


 

ADD REPLYlink written 3.6 years ago by thackl2.5k

Try running full Redundans pipeline for protists genomes - sometimes I got improvement with just one paired-end library. But of course the best would be to have some mate-pairs. If you feel like, you can split PacBio reads and create artificial mate-pair reads. Anyway, sspace-longread should work good. 

What kind of libraries do you have for your plant genome? 

Let me know if you have some problems, I'll be happy to help! 

ADD REPLYlink written 3.6 years ago by Leszek3.9k
1

Yeah, I was thinking about using artificial MP libs from PacBio reads as well. I let you know, how it works.

For the plant genome I have 100bp overlapping HiSeq reads plus 500 and 800 bp insert size PEs, and also MP libs from 2kbp to 20kbp, however only with 49bp read length, which makes them hard to map uniquely. And there are a lot of repeats, which no doubt will interfere with redundancy reduction. I'll give it a try as well after it worked for the small genomes.

ADD REPLYlink written 3.6 years ago by thackl2.5k

I think your plant dataset (3xPE + 2xMP) is perfect to try Redundans. I've been playing with 100bp PE (assembly+scaffolding) and 50bp MP (scaffolding) libraries with good results. You are right, many reads will fail to align uniquely, but those that will should improve assembly considerably! 

Since you have overlapping PE, you may also try ALLPATH-LG in haploidify mode. But you will need some powerful machine to launch it. 

ADD REPLYlink written 3.6 years ago by Leszek3.9k
1

I used ALLPATHS-LG with haploidify for the draft assembly - a 2TB RAM machine was necessary to run it without digital normalization :)

ADD REPLYlink written 3.6 years ago by thackl2.5k

I know, it's crazy. I needed 128GB RAM for 20Mb genome! How it performed? In my case, so so, especially taking into account the time and resources needed for the run... 

ADD REPLYlink written 3.6 years ago by Leszek3.9k

Well, it handled the data set (which not every assembler managed to do in the first place) and it produced the "best" assembly of all the things I tried (including SOAP-denovo, ABySS, Masurca, Platanus ...). But we are talking N50 < 20kb for scaffolds and given the computational requirements - it's not like I can just rerun it a few times to optimize configuration...

ADD REPLYlink written 3.6 years ago by thackl2.5k

Hello thackl. I was interested about your experience with Platanus software, and I am starting to use Platanus and it gives me this error: K = 31, saving kmers from reads... Error(5): Kmer distribution exception!! Kmer distribution cannot be caluculated proper. Do you have any good suggestions for this problem? Thanks!

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by hope0

Hello thackl. I was wondering about your experience with dipSPAdes, did you try to estimate unique gene content prior and after using dipSPAdes? I found the algorithm to reduce the assembly size considerably and even found cases where tRNA synthatases were missing from the dipSPAdes assembly but not input.

ADD REPLYlink written 3.5 years ago by Adrian Pelin2.2k
1

Hello Leszek, Thanks for mentioning Redundans, I have tested it on a few recent assemblies and I am quite pleased. I have a question however, your default parameters are a minimum identity of 51% and minimum overlap of 66%, do you not find those criteria a little bit too low (i.e. that you will remove contigs that should not be removed because they contain unique information)? Typically assemblers remove bubbles ate 80-90% identity, no?

 

ADD REPLYlink written 3.5 years ago by Adrian Pelin2.2k
1

Good point, Adrian, the identity/overlap cut-off are rather relaxed because usually you don't know what is the level of divergence between your heterozygous regions up front. Users are free to use more restrictive thresholds (--identity/--overlap parameters).

What I've noticed in simulations is that using relaxed identity/overlap cut-off works fine, unless you expect some segmental duplications that may be removed by such relaxed cut-off. These will appear most likely as gaps in your assembly, anyway. 

ADD REPLYlink written 3.5 years ago by Leszek3.9k
1

hi, as you said, there is no good solution for heterozyogous genomes. my question is : there are lots of repeat regions in heterozygous genomes which are included in redundancy. if you remove them, that means you will lose lots of genetic information, doesn't that?

ADD REPLYlink written 2.6 years ago by elva2005120210

Yes, reduction step will remove contigs representing highly similar repeats. but if you have mate pair libraries you will recover these missing regions during scaffolding step. From simulations with A. thaliana genome (~25% of which is repeats) I've observed reduction of the assembly to ~80-90% of expected size, but after scaffolding with mate pairs the assembly was completely fine (100% of expected size). The manuscript is in revision, so stay tuned. btw: post to Redundans mailing list for more details.

ADD REPLYlink written 2.6 years ago by Leszek3.9k
1

These are interesting results but A. thaliana is highly selfing and has a tiny genome. I'm not sure how/if these findings can be directly related to the question about repetitive DNA and heterozygosity issues, or more specifically, to typical plant genomes. For a genome like A. thaliana where you can assemble 90% of the expected size, I would say there is little to be gained by any k-mer reduction/normalization or further scaffolding. I know it is a demonstration of the method and there is the practical consideration that you can't assemble large genomes easily by any method. Though, it would be nice to see some kind of broader application of these methods for comparison.

ADD REPLYlink written 2.6 years ago by SES8.1k

good point. we are involved in very challenging genomes (2-3Gb plants), but unfortunately for these challenging genomes there is no good reference to compare with...

ADD REPLYlink written 2.6 years ago by Leszek3.9k

You are right of course, the current lack of genomes in that size range makes it very difficult to compare methods. Though, I think this will change very soon due to PacBio and a number of new hybrid sequencing and optical mapping approaches.

ADD REPLYlink written 2.6 years ago by SES8.1k

yes, finally, you might get expected size through scaffolding with mate pairs, but i suppose we would get more N's filling these repeats, isn't it?

best, Elva

ADD REPLYlink written 2.6 years ago by elva2005120210

the gaps can be filled ie using GapCloser, removing most of Ns from the assembly

ADD REPLYlink written 2.6 years ago by Leszek3.9k

I am trying to assemble an invertebrate genome and it is known to be highly heterozygous. From what I understand, assembly of a highly heterozygous genome should result in a larger than estimated genome size (whereby Redundans can work on reducing the redundancy). However, in my case, my resulting assembly is a lot smaller than the expected size with only 40% of total reads mapping back to the assembly.

From my total Illumina reads, I expect to have 50x coverage of the genome - and this coverage is further lowered according to homozygous (1C peak ~13x) and heterozygous (2C peak ~28x) regions. Based on your experience with polymorphic genomes, do you think lack of coverage is the reason for the incomplete smaller assembly size?

ADD REPLYlink written 19 months ago by mht0

you can post your questions regarding redundans / heterozygous genome assembly in https://groups.google.com/forum/#!forum/redundans

give more details, what is the species, expected genome size, what kind of libraries / assembler you are usings etc

ADD REPLYlink written 19 months ago by Leszek3.9k
5
gravatar for the.good.doctor.is.in
5.8 years ago by
the.good.doctor.is.in60 wrote:

Erich Schwarz and some other nematode genomicists have had good luck applying digital normalization to the reads first; see the khmer handbook here: http://ivory.idyll.org/blog/an-assembly-handbook-for-khmer.html. You might try that. I'd be happy to correspond with you if you have any trouble. --titus

ADD COMMENTlink modified 5.8 years ago by Istvan Albert ♦♦ 78k • written 5.8 years ago by the.good.doctor.is.in60

This approach looks very interesting. I would like to see how this and Hapsembler, or other approaches, compare.

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by SES8.1k

I will definitely give it a try:) Thanks, will reply shortly.

ADD REPLYlink written 5.8 years ago by Leszek3.9k
3
gravatar for Casey Bergman
5.8 years ago by
Casey Bergman17k
Athens, GA, USA
Casey Bergman17k wrote:

Vinson et al. adapted ARACHNE to assemble the polymorphic Ciona savignyii genome a few years back, and their recipe is detailed in this paper: http://www.ncbi.nlm.nih.gov/pubmed/16077012

It looks like BROAD provide documentation for how to run ARACHNE on polymorphic genomes (and specifically mention fungal genomes) here: http://www.broadinstitute.org/crd/wiki/index.php/Polymorphism

ADD COMMENTlink written 5.8 years ago by Casey Bergman17k

the wiki is useful. but I was unable to run ARACHNE. is it possible to run it outside BROAD?

ADD REPLYlink written 5.7 years ago by Leszek3.9k
2
gravatar for amvarani
3.5 years ago by
amvarani20
Brazil
amvarani20 wrote:

I have been using a combination of platanus and redundans with notable results.

My genome is auto or allo tetraploid, heterozygous (~3%) and has 320-330Mb (160Mb 1C). 

I have made 180M of reads 2x 100bp PE (350pb insert size) and 120M 2x100 Mate pair (3kb insert size).  I have used platanus in a first stage (assemble, scaffold and gap_close) with the -u parameter set to "1.0", and starting the k with "21" . (I have tested all the assemble combinations, and that ones gave to me the best results, considering largest scaffold, N50 and number of contigs).  

My result in this first stage is a high fragmented assembly, with more than 1M contigs, with N50 less than 1kb.

However, 160Mb of the assembly is contained in contigs larger than 500p (largest scaffold with 950kb). If these contigs with less than 500bp are excluded, my N50 increase to 120kb. This means that platanus did a very good job merging the haplotypes (certainlly  due the "-u 1.0" parameter).

In the second step, I have used redundans with my fragmented assembly (all contigs) and the results are very interesting! My final assembly size is 150Mb with N50 of 380kb and largest scaffold of 3,4M !! This using default redundans parameters. Now I'm going to play with --identity , --overlap, --minLength, --limit and --iters parameters.

I'm really pleased with that!

 

 

 

  

ADD COMMENTlink written 3.5 years ago by amvarani20

Great to hear that! Thanks for testing & sharing. Let me know if you experience any difficulties. 

ADD REPLYlink written 3.5 years ago by Leszek3.9k

I recommend looking at the stats (largest contig/n50) after redundans gets rid of contigs that are redundant, just to have an idea what percent was extra. Then the rest of the job is done with SSPACE and GapCloser.

ADD REPLYlink written 3.5 years ago by Adrian Pelin2.2k

That's right, Adrian, but you may also see improvement in Redundans from using BWA MEM instead of bowtie for scaffolding and of course multiple iterations of scaffolding/gap closing. 

ADD REPLYlink written 3.5 years ago by Leszek3.9k

Agreed for sure!

ADD REPLYlink written 3.5 years ago by Adrian Pelin2.2k

Hi amvarani,

I am using platanus to assemble two pair end libraries and one mate pair library for scaffolding. Do I need to reverse complement the mate pair library before assemble. Thank you!

Hui

ADD REPLYlink written 3.4 years ago by tigerhui5870

 

Hi Hui,

 

In general the mate-pair libraries made with illumina nextera are in outie orientations (i.e  <------     ------>) .  If you have that kind of libraries you don't need to do anything.

If you have a combined fastq (interleaved) file you must use the "-op" , '-ip" option (lowecase), if you have both paired files you should use "-OP"  "-IP" (uppercase) .  I have noted that depending of file name platanus has a strange bug which results in a wrong recognition of the files.

For instance, your both fastq (for -OP option mate-pairs) files must have exactly the same name AND containing letters and numbers (except to distinguish the pairs). I would suggest something like "Library003_MP1.fastq" and "Library003_MP2.fastq".   

 

 

 

ADD REPLYlink written 3.4 years ago by amvarani20

Hi amvarani,

Thank you so much for helping! I have two pair end libraries and one mate pair library.

I used my normalized fastq files to assemble, but it alway said 'Read file exception!!'

Below is my script and the error. How to fix it? Thanks a lot!

/diag/home/hzz0036/software/platanus assemble -k 21 -u 1.0 -f /diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq /diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq /diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq /diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq /diag/home/hzz0036/goosegrass_genome/PBU_7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq /diag/home/hzz0036/goosegrass_genome/PBU_7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq -o /diag/home/hzz0036/goosegrass_genome/platanus

 

Error(2): Read file exception!!

Read file exception!!

Read file is not FASTA/FASTQ format.

Best regards,

Hui

ADD REPLYlink written 3.4 years ago by tigerhui5870

Hi Hui,

Try a shorter name for your files, like as  K25_C30_pctSD200.fastq 

ADD REPLYlink written 3.4 years ago by amvarani20

Hi amvarani, I was interested about your experience with Platanus software, and I am starting to use Platanus and it gives me this error: platanus assemble -f AS_R1.clean.cor.fastq AS_R2.clean.cor.fastq AS285_S40_L007_R1.clean.cor.fastq AS285_S40_L007_R2.clean.cor.fastq AS485_R1.clean.cor.fastq AS485_R2.clean.cor.fastq AS675_R1.clean.cor.fastq AS675_R2.clean.cor.fastq -t 20 -o /public/home/zpxu/AS/results/platanus_quake/AS -k 31 -m 1128 -u 0.3 -d 0.3 -a 5

K = 31, saving kmers from reads... Error(5): Kmer distribution exception!! Kmer distribution cannot be caluculated proper.

I have tried many times for different k and more big m value, but it also gives me the same error. So, I really don't know why and how to solve this problem. Could you help me ? Thanks! Best regards, Zhongping Xu

ADD REPLYlink written 2.5 years ago by hope0

With Platanus, did you also set these options for scaffold and gapclose ?

Platanus scaffold -s 21 -v 21 -u 1

Platanus gap_close -s 21 -k 21 -vo 21 -vd 21

ADD REPLYlink written 2.4 years ago by Picasa390
1
gravatar for Rayan Chikhi
5.8 years ago by
Rayan Chikhi1.4k
France, Lille, CNRS
Rayan Chikhi1.4k wrote:

You might want to take a look at Hapsembler (From the web page: Hapsembler is a haplotype-specific genome assembly toolkit that is designed for genomes that are rich in SNPs and other types of polymorphism.).

ADD COMMENTlink written 5.8 years ago by Rayan Chikhi1.4k
1

tried it, played with parameters, corrected reads, but with no success... the assembly is quite similar to Newbler one

ADD REPLYlink written 5.8 years ago by Leszek3.9k
1
gravatar for Rayan Chikhi
5.8 years ago by
Rayan Chikhi1.4k
France, Lille, CNRS
Rayan Chikhi1.4k wrote:

BGI reported that the scaffolding module of SOAPdenovo2 has been improved to deal with heterozygosity. Did you try to run the full Soapdenovo2 pipeline?

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Rayan Chikhi1.4k
1

SOAPdenovo2 seem to work really good! It uses multiple k in one run and is very fast. It return separate contig for each haplotype. I have to investigate the quality further.

ADD REPLYlink written 5.8 years ago by Leszek3.9k
1

the assembly is still quite fragmented. but got some ideas to improve it further :)

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Leszek3.9k
1

1) Have you tried Planatus ( http://platanus.bio.titech.ac.jp/platanus-assembler/platanus-1-2-1/ )? I used this software for my 6% heterozygous genome and it improve scaffold continuity against SOAPdenovo2

2) Have you published your manuscript yet? If you did, will you put a link here? I think people will be interested in playing with the data.

ADD REPLYlink modified 3.9 years ago by SES8.1k • written 4.1 years ago by kevinchjp10

I did use the platanus assemble,which help us a lot for the assemble. But could you publish the manuscript? cannot wait!

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by xinwangcau0

@kevinchjp 

1) yes, platanus works much better with my data. although I still see room for improvement;) 

2) the manuscript is in the production... hopefully early next year. I'll keep you posted! 

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Leszek3.9k
1
gravatar for rtliu
3.6 years ago by
rtliu2.0k
New Zealand
rtliu2.0k wrote:

Platanus Genome Assembler was designed for genomes of higher heterozygosity.

The paper was published on Genome Biology in 2014.

ADD COMMENTlink written 3.6 years ago by rtliu2.0k

In fact I've tried Platanus. It works well, especially for divergence between heterozygous regions <10%. But I found Redundans returning less fragmented assemblies and it recognise heterozygous regions with divergence up to 45%. 

ADD REPLYlink written 3.6 years ago by Leszek3.9k
1
gravatar for mikael.dahl
3.4 years ago by
mikael.dahl20
European Union
mikael.dahl20 wrote:

Hi, I  do also struggle with a heterozygous genome. I found your post and your recommendation to use the -u 1.0 flag in platanus. Did you look into how it affects your assembly? 

In the readme file for platanus is says: 

"-u FLOAT: Maximum difference for bubble crush (identity, default 0.1) Larger FLOAT increases the number of bubbles merged. If heterozygosity of the sample is high, large FLOAT may be suitable (Ex. -u 0.2)."

So I guess that using 1.0 means that all bubbles in the graph will be collapsed, but how is the choise between the two haplotypes made? quality on reads? Since the genome is heterozygous in those positions both paths of the bubble is true and if we randomly choose to keep only one there will be tons of reads (if we do an RNA seq experiment and use the genome as a reference) that not will match because we collapsed the allele being expressed. 

 

I mean if the authors of an assembler that is specialized on heterozygous genomes recommend to increase the -u value to 0.2 in case of high heterozygosity it feel kind of extreme to use 1.0. 

 

 

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by mikael.dahl20

This should be added as a comment to the answer it relates to. You raise good points for discussion but it is confusing if out of place.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by SES8.1k

Good point Mikael, I agree with you, however I have tested -u from 0.1 to 2.0. For each assembly,  were evaluated: Assembly size, scaffold N50, Contig N50, largest Contig and number of contigs. I have found better results using the "-u 1.0". 

For instance using "-u 0.2", also give good results, but with "-u 1.0" retrieved the largest scaffolds and also N50. I also have RNAseq (single end 200bp) data and about of 94% of all RNAseq reads map in this assembly, which is pretty high rate! 

Regarding the bubbles, Platanus generate a bubble fasta file.  Looking this file (BLASting, in fact) I have noticed that most of the bubbles are mobile genetic elements, like as LTR retrotranspons and also microsatelaties or telomer and centromer derived . Then I decided to collapse LTR retros all and try to link scaffolds with several platanus scaffold iterations (using the same "-u 1.0" parameter and also "-s 16 and -v 16"), and then redundans. After redundans step, I did again several scaffolds iterations with platanus. I have also disabled the GapCloser and SSPACE step on redundans (In fact I have used only the reductions step)

After these steps, I have mapped all my reads back to the assembly (about of 90% mapped). I have now assembled the rest of the reads (10%) without coverage cufoff disabled with SPAdes 3.5.0, which gave to me some contigs/scaffolds which I have added to the main assembly, and then some more scaffold iterations with platanus, and then another redundads step (only reduction). 

In my final file I have used GapFiller with 10 iterations.

Doing that, at the moment I got a N50 of 3,5Mb and Contig N50 of about 1Mb.

This receipt worked for my genome, maybe be useful for the colleagues here.

 

Some features which can be implemented in redundans:

(1) For the output a fasta file with the redundans contigs/scaffold (very important!!! with this file we can figure out the repeats type found in your genome);

(2) Option to choose SSPACE or Platanus Scaffold;

(3) Option to choose GapCloser or GapFiller (or maybe both!). I would also recommend the platanus gap_closer as a option;  

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by amvarani20
0
gravatar for amvarani
3.5 years ago by
amvarani20
Brazil
amvarani20 wrote:

Hi there

Other good scaffolding option is the scafmatch software (http://alan.cs.gsu.edu/NGS/?q=content/scaffmatch).

ADD COMMENTlink written 3.5 years ago by amvarani20

ScaffMatch requires SAM-formatted input independently from F and R reads, which is a bit problematic, but I'll try it. Thanks for hint! 

ADD REPLYlink written 3.5 years ago by Leszek3.9k

hi,ScaffMatch support sam and fastq。

ADD REPLYlink written 3.4 years ago by orange30
0
gravatar for kingcohn
2.1 years ago by
kingcohn20
kingcohn20 wrote:

Hello Leszek, I am attempting to run your program but am having issues. I first installed using the pre-compiled binaries via your link and then tried to manually install all of the programs. After confirming all the dependencies have been acquired I am still getting some errors in the redudans.py script. Any suggestions are greatly appreciated! I am running ubuntu 16.04 on an Intel Core i7.

~/src/redundans$ ./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1 Options: Namespace(fasta='test/contigs.fa', fastq=['test/5000_1.fq.gz', 'test/5000_2.fq.gz', 'test/600_1.fq.gz', 'test/600_2.fq.gz'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file="" '<stderr="">', mode 'w' at 0x7f003a9bc1e0>, mapq=10, minLength=200, nocleaning=True, nogapclosing=True, noreduction=True, noscaffolding=True, outdir='test/run1', overlap=0.66, resume=False, sspacebin='~/src/SSPACE/SSPACE_Standard_v3.0.pl', threads=4, verbose=True)

Traceback (most recent call last): File "./redundans.py", line 450, in <module> main()

File "./redundans.py", line 445, in main o.verbose, o.log)

File "./redundans.py", line 271, in redundans os.makedirs(outdir)

File "/usr/lib/python2.7/os.py", line 157, in makedirs mkdir(name, mode)

OSError: [Errno 13] Permission denied: 'test/run1'

ADD COMMENTlink written 2.1 years ago by kingcohn20

This is not an answer but rather another question. It would be better to open a new thread for your question if you need people to respond quickly.

Also, the error seems to be with test/run1, do you have a sub-directory named test in your directory already?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Rohit1.3k

thanks, I'll ask a new question. But the directory does not. Using my own data set, the error is now failure to recognize GapCloser

spades/CPBscaffolds.fasta -o redundansCPB [ERROR] GapCloser: not found

despite GapCloser in directory:

src/redundans

CHANGELOG.md fastq2insert_size.py GapFiller_v1-10_linux-x86_64 docs fastq2insert_size.pyc INSTALL.sh fasta2homozygous.py fastq2mates.py LICENSE fasta2homozygous.pyc fastq2sspace.py pyScaf.py FastaIndex.py fastq2sspace.pyc README.md FastaIndex.pyc filterReads.py redundans.py fasta_stats.py filterReads.pyc SSPACE-STANDARD-3.0_linux-x86_64 fasta_stats.pyc GapCloser test

ADD REPLYlink written 2.1 years ago by kingcohn20

you need to add the path to GapCloser to your system PATH. Have a look how to do it: https://github.com/lpryszcz/redundans/issues/23

ADD REPLYlink written 2.0 years ago by Leszek3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1828 users visited in the last hour