Question: Abyss genome assembly on several nodes
0
gravatar for Igor Lalin
7 months ago by
Igor Lalin0
Canada/London/London Research and Development Centre
Igor Lalin0 wrote:

Hi I am running the following script:

  #!/bin/bash
  # Abyss assembly pipeline

    cores=40
    species='Favanaceum'
    Qcut=30

    # merge non-overlapping pairs with konnector and assembly at various k
    for k in `seq 26 10 126`;
    do
    konnector -j $cores -k $k -o kon$k out_reads_1.fastq out_reads_2.fastq
    mkdir ${species}-k$k
    abyss-pe -C ${species}-k$k name=$species-$k k=$k np=$cores q=$Qcut \
    lib='pe1 pe2' long='longa' \
    pe1='../kon${k}_reads_1.fq' pe2='../kon${k}_reads_2.fq' \
    se='../out_merged.fastq ../kon${k}_merged.fa' \
    longa='../05001-genome.fa'

    done

As you can see, it's relatively straightforward where after qsub -pe smp 40, I use 40 slots on one node. Would it be possible to run parallel jobs on different nodes?

That way you could have several different k assemblies running at the same time for the sake of decreased time.

How would you change my shell script to do this?

Thank you so much

abyss node • 296 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Igor Lalin0

You should check with your HPC folks on how to submit a job that needs 40+ cores, they'd be able to help you better.

EDIT: Removed comments that recommended better formatting.

ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS24k

Thank you RamRs. I appreciate it.....will do!

ADD REPLYlink written 7 months ago by Igor Lalin0
2
gravatar for lieven.sterck
7 months ago by
lieven.sterck5.8k
VIB, Ghent, Belgium
lieven.sterck5.8k wrote:

Make a shell script that holds this part:

k=$1
konnector -j $cores -k $k -o kon$k out_reads_1.fastq out_reads_2.fastq
mkdir ${species}-k$k
abyss-pe -C ${species}-k$k name=$species-$k k=$k np=$cores q=$Qcut \
lib="pe1 pe2" long="longa" \
pe1="../kon${k}_reads_1.fq" pe2="../kon${k}_reads_2.fq" \
se="../out_merged.fastq ../kon${k}_merged.fa" \
longa="../05001-genome.fa" \
unitigs

then simply submit the jobs using your loop:

for k in `seq 26 10 126`;
do
qsub <abyssScript> $k
done

if your genome is not that big and as you do not have many input files, ABySS should run fairly quick enough on a single (multi-core) node.

Xtra tip: What you can do is to add the target 'unitigs' in your cmdline (added it in above example) which will stop the ABySS pipeline after generating the unitigs, which is already a good point to choose your 'best kmer

ADD COMMENTlink modified 7 months ago • written 7 months ago by lieven.sterck5.8k

Great...thank you very much!

ADD REPLYlink written 7 months ago by Igor Lalin0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1663 users visited in the last hour