Abyss genome assembly on several nodes
1
0
Entering edit mode
5.3 years ago
Igor Lalin • 0

Hi I am running the following script:

  #!/bin/bash
  # Abyss assembly pipeline

    cores=40
    species='Favanaceum'
    Qcut=30

    # merge non-overlapping pairs with konnector and assembly at various k
    for k in `seq 26 10 126`;
    do
    konnector -j $cores -k $k -o kon$k out_reads_1.fastq out_reads_2.fastq
    mkdir ${species}-k$k
    abyss-pe -C ${species}-k$k name=$species-$k k=$k np=$cores q=$Qcut \
    lib='pe1 pe2' long='longa' \
    pe1='../kon${k}_reads_1.fq' pe2='../kon${k}_reads_2.fq' \
    se='../out_merged.fastq ../kon${k}_merged.fa' \
    longa='../05001-genome.fa'

    done

As you can see, it's relatively straightforward where after qsub -pe smp 40, I use 40 slots on one node. Would it be possible to run parallel jobs on different nodes?

That way you could have several different k assemblies running at the same time for the sake of decreased time.

How would you change my shell script to do this?

Thank you so much

abyss node • 1.2k views
ADD COMMENT
0
Entering edit mode

You should check with your HPC folks on how to submit a job that needs 40+ cores, they'd be able to help you better.

EDIT: Removed comments that recommended better formatting.

ADD REPLY
0
Entering edit mode

Thank you RamRs. I appreciate it.....will do!

ADD REPLY
2
Entering edit mode
5.3 years ago

Make a shell script that holds this part:

k=$1
konnector -j $cores -k $k -o kon$k out_reads_1.fastq out_reads_2.fastq
mkdir ${species}-k$k
abyss-pe -C ${species}-k$k name=$species-$k k=$k np=$cores q=$Qcut \
lib="pe1 pe2" long="longa" \
pe1="../kon${k}_reads_1.fq" pe2="../kon${k}_reads_2.fq" \
se="../out_merged.fastq ../kon${k}_merged.fa" \
longa="../05001-genome.fa" \
unitigs

then simply submit the jobs using your loop:

for k in `seq 26 10 126`;
do
qsub <abyssScript> $k
done

if your genome is not that big and as you do not have many input files, ABySS should run fairly quick enough on a single (multi-core) node.

Xtra tip: What you can do is to add the target 'unitigs' in your cmdline (added it in above example) which will stop the ABySS pipeline after generating the unitigs, which is already a good point to choose your 'best kmer

ADD COMMENT
0
Entering edit mode

Great...thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6