Question: How to use ALLPATH-LG assembler
0
gravatar for Toto26
11 days ago by
Toto2610
Toto2610 wrote:

Hi all the community, I actually need your help because it is the first time I have to assemble a genome.

I actually have in my possession 2 fasta file: reads1.fq and reads2.fq Those file are comming from an illumina Hiseq 3000 150bp and the genome size of my specie is around 1.5 GB.

I would like to use the programm ALLPATH-LG to do so. I read the manual before posting my question here but I still do not know what to really do. Sould I first prepare my data or is there only one commande to execute with my two fasta file and the program runs alone?

If someone could explain me more the detail the process or if someone had already used this program if he can explain me the steps of the process it would be very kind of you.

assembly genome • 127 views
ADD COMMENTlink modified 10 days ago by Beuss90 • written 11 days ago by Toto2610

Take a look to DISCOVAR denovo and MaSuRCA :

https://software.broadinstitute.org/software/discovar/blog/

http://masurca.blogspot.com/

ADD REPLYlink written 10 days ago by Beuss90

Hi, thanks for you help. I'm trying to use masurca, can you tell me if my config file is correclty written please?

#PBS -S /bin/bash
#PBS -l nodes=1:ppn=8:bigmem,mem=100gb
#PBS -e /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.error
#PBS -o /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.out
#PBS -N ACG-006
#PBS -q q1week


#DATA
PE= pe 150 22  /pandata/ACG-0006_0027/reads1.fq  /pandata/ACG-0006_0027/reads2.fq

#END

#PARAMETERS
#set this to 1 if your Illumina jumping library reads are shorter than 100bp
EXTEND_JUMP_READS=0
#this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
GRAPH_KMER_SIZE = auto
#set this to 1 for all Illumina-only assemblies
#set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs
#otherwise keep at 0
USE_LINKING_MATES = 0
#specifies whether to run mega-reads correction on the grid
USE_GRID=0
#specifies queue to use when running on the grid MANDATORY
GRID_QUEUE=all.q
#batch size in the amount of long read sequence for each batch on the grid
GRID_BATCH_SIZE=300000000
#coverage by the longest Long reads to use
LHE_COVERAGE=30
#this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms 
LIMIT_JUMP_COVERAGE = 300
#these are the additional parameters to Celera Assembler.  do not worry about performance, number or processors or batch sizes -- these are computed automatically. 
#set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
CA_PARAMETERS =  cgwErrorRate=0.15
#minimum count k-mers used in error correction 1 means all k-mers are used.  one can increase to 2 if Illumina coverage >100
KMER_COUNT_THRESHOLD = 1
#whether to attempt to close gaps in scaffolds with Illumina data
CLOSE_GAPS=1
#auto-detected number of cpus to use
NUM_THREADS = 16
#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
JF_SIZE = 200000000
#set this to 1 to use SOAPdenovo contigging/scaffolding module.  Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data
SOAP_ASSEMBLY=0
#END

Thanks again for your help

ADD REPLYlink written 8 days ago by Toto2610
1
gravatar for h.mon
11 days ago by
h.mon15k
Brazil
h.mon15k wrote:

If this is all the data you have, you can't use Allpaths-LG, because it does requires at least two different libraries (generally one paired-end, and a second mate-pairs):

http://software.broadinstitute.org/allpaths-lg/blog/?page_id=336

B1. Can I assemble data from ONE library using ALLPATHS-LG?

No, but we understand the need for programs that can do this, and there are some, including Velvet and ABySS. Multiple libraries enable higher assembly quality but entail more labwork.

I remember someone (maybe Brian Bushnell) posting here at BioStars about creating a "fake" mate-pairs or long reads library to be able to use Allpaths-LG with just one library.

edit: here, I found the post:

Assembler for large genome de novo assembly with Illumina paired end reads of 150 pb

ADD COMMENTlink modified 11 days ago • written 11 days ago by h.mon15k

Thanks for your answer. Do you have another program to advice? I already used IDBA-UD but it is not working for this data I do not know why...

ADD REPLYlink written 11 days ago by Toto2610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 952 users visited in the last hour