Question: abyss-pe makes unitigs.fa file but not contigs.fa file
0
gravatar for queentakesjackfox
8 months ago by
queentakesjackfox0 wrote:

I ran the following code:

abyss-pe name=Name k=96 in=’read1.fastq read2.fastq’

And I received the following error:

abyss-fixmate:   error:   The  mate  pairs  of  this  library  are  oriented
forward-forward (FF), which is not supported by ABySS

The reads I inputted were not mate pairs, and I am not sure if that error message is using legacy terminology from an older version. Regardless, based on this error, I used the following recommendations:

https://github.com/bcgsc/abyss/issues/146

Hi, Michael. ABySS stopping in this situation is intentional. If 2.6 times the number of reads are oriented FF than FR, then the library prep has issues that ABySS will not be able to handle. Better to stop and tell the user. You could omit the paired-end information by using se=reads.fq.gz rather than in=reads.fq.gz.

This, however, did not resolve the missing "contigs.fa" file issue.

I am re-running abyss-pe once again using the following recommendations for how to include merged read files:

https://groups.google.com/forum/#!topic/abyss-users/RFuhdv8r3dc

abyss-pe name="Name_Merged" k=96 se="merged.fastq read1.fastq read2.fastq"

So, you can feed these 3 files to ABySS and assemble only up to the unitigs stage. Then, continue the assembly into the contigs or scaffold stage using the original read files.

Still not sure why these files aren't automatically going into the contigs stage.

Based on the flow chart found here, it appears that the program stops at the "align reads to contigs" phase, ending with Name-3.fa and the unitigs.fa file.

abyss assembly • 333 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by queentakesjackfox0

What exactly is your issue here? (or is it resolved already?)

In any case,could you post the runtime log output you get from ABySS?

ADD REPLYlink written 8 months ago by lieven.sterck8.7k

Issue is not yet resolved! I have run the abyss program three separate times using three separate recommendations:

abyss-pe name=Name k=96 in=’read1.fastq read2.fastq’

This did not output any contig files, only unitig files, and this did not include any stats files. The main error given was:

"abyss-fixmate:   error:   The  mate  pairs  of  this  library  are  oriented
forward-forward (FF), which is not supported by ABySS"

I ran the code a second time based on recommendations to use se=reads.fq.gz rather than in=reads.fq.gz

abyss-pe name=Name_2 k=96 se='read1.fastq read2.fastq'

This again did not give out any contig files, only unitig. It did, however, give out stats files.

    abyss-pe name=Name k=96 se='read1.fastq read2.fastq' 
ABYSS -k96 -q3    --coverage-hist=coverage.hist -s Name-bubbles.fa  -o Name-1.fa  read1.fastq read2.fastq
ABySS 2.2.3
ABYSS -k96 -q3 --coverage-hist=coverage.hist -s Name-bubbles.fa -o Name-1.fa read1.fastq read2.fastq
Nameing `read1.fastq'...
`read1.fastq': discarded 11627 Names containing non-ACGT characters
Nameing `read2.fastq'...
`read2.fastq': discarded 3281 Names containing non-ACGT characters
Loaded 1065502186 k-mer
Minimum k-mer coverage is 179
Using a coverage threshold of 2...
The median k-mer coverage is 3
The reconstruction is 504680840
...
MergeContigs   -k96 -o Name-3.fa Name-2.fa Name-2.dot Name-2.path
The minimum coverage of single-end contigs is 2.
The minimum coverage of merged contigs is 2.
awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
        Name-2.path Name-1.fa >Name-indel.fa
ln -sf Name-3.fa Name-unitigs.fa
abyss-fac   Name-unitigs.fa |tee Name-stats.tab
n       n:500   L50     min     N75     N50     N25     E-size  max     sum     name
2025039 186724  65439   500     601     756     1058    975     8716    145.9e6 Name-unitigs.fa
time user=3.07s system=0.34s elapsed=3.03s cpu=112% memory=4 job=abyss-fac Name-unitigs.fa
time user=0.00s system=0.00s elapsed=3.03s cpu=0% memory=1 job=
ln -sf Name-stats.tab Name-stats
tr '\t' , <Name-stats.tab >Name-stats.csv
abyss-tabtomd Name-stats.tab >Name-stats.md

I tried again with Merged reads and paired ends reads. Again, no contigs files, but this time there were stats files produced:

abyss-pe name="Name_withMergedFiles" k=96 se="read_Merged.fastq read1.fastq read2.fastq"
ABYSS -k96 -q3    --coverage-hist=coverage.hist -s Name_withMergedFiles-bubbles.fa  -o Name_withMergedFiles-1.fa  read_Merged.fastq read1.fastq read2.fastq
ABySS 2.2.3
ABYSS -k96 -q3 --coverage-hist=coverage.hist -s Name_withMergedFiles-bubbles.fa -o Name_withMergedFiles-1.fa read_Merged.fastq read1.fastq read2.fastq
Reading `read_Merged.fastq'
...
MergeContigs   -k96 -o Name_withMergedFiles-3.fa Name_withMergedFiles-2.fa Name_withMergedFiles-2.dot Name_withMergedFiles-2.path
The minimum coverage of single-end contigs is 1.05155.
The minimum coverage of merged contigs is 2.
Consider increasing the coverage threshold parameter, c, to 2.
awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
        Name_withMergedFiles-2.path Name_withMergedFiles-1.fa >Name_withMergedFiles-indel.fa
ln -sf Name_withMergedFiles-3.fa Name_withMergedFiles-unitigs.fa
abyss-fac   Name_withMergedFiles-unitigs.fa |tee Name_withMergedFiles-stats.tab
n       n:500   L50     min     N75     N50     N25     E-size  max     sum     name
2750465 415463  134556  500     651     879     1309    1143    14517   363.1e6 Name_withMergedFiles-unitigs.fa
time user=4.16s system=0.58s elapsed=4.13s cpu=114% memory=4 job=abyss-fac Name_withMergedFiles-unitigs.fa
time user=0.00s system=0.00s elapsed=4.13s cpu=0% memory=1 job=
ln -sf Name_withMergedFiles-stats.tab Name_withMergedFiles-stats
tr '\t' , <Name_withMergedFiles-stats.tab >Name_withMergedFiles-stats.csv
abyss-tabtomd Name_withMergedFiles-stats.tab >Name_withMergedFiles-stats.md
ADD REPLYlink written 8 months ago by queentakesjackfox0

OK,

did you got any feedback just prior to the error of the fixmate you are getting. I'm looking for something similar as the first post on the github (https://github.com/bcgsc/abyss/issues/146) you refer to. The output of abyss-map step.

It could be you need to run abyss-pe with the -v (or -vv) option to get that log info.

ADD REPLYlink modified 8 months ago • written 8 months ago by lieven.sterck8.7k

Running abyss-pe with the -vv option, here is what I see:

Bubbles: 90976 Popped: 77701 Scaffolds: 0 Complex: 5198 Too long: 0 Too many: 250 Dissimilar: 7827
V=5500930 E=7525294 E/V=1.37
Degree: ▆█▆▁_
        01234
0: 27% 1: 34% 2-4: 36% 5+: 2.3% max: 193
MergeContigs -vv  -k96 -o Name-3.fa Name-2.fa Name-2.dot Name-2.path
Reading `Name-2.dot'...
Read 5969952 vertices. Using 737 MB of memory.
Reading `Name-2.fa'...
Read 1000000 sequences. Using 1.12 GB of memory.
Read 2000000 sequences. Using 1.51 GB of memory.
Read 2984976 sequences. Using 1.96 GB of memory.
Reading `Name-2.path'...
Read 138906 paths. Using 1.99 GB of memory.
The minimum coverage of single-end contigs is 1.05155.
The minimum coverage of merged contigs is 2.
Consider increasing the coverage threshold parameter, c, to 2.
n       n:200   L50     min     N75     N50     N25     E-size  max     sum     name
2750465 1548753 398603  200     318     512     894     745     14517   708.1e6 Name-3.fa
awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
        Name-2.path Name-1.fa >Name-indel.fa
ln -sf Name-3.fa Name-unitigs.fa
abyss-fac   Name-unitigs.fa |tee Name-stats.tab
n       n:500   L50     min     N75     N50     N25     E-size  max     sum     name
2750465 415463  134556  500     651     879     1309    1143    14517   363.1e6 Name-unitigs.fa
time user=3.16s system=0.58s elapsed=3.33s cpu=112% memory=4 job=abyss-fac Name-unitigs.fa
time user=0.00s system=0.00s elapsed=3.33s cpu=0% memory=1 job=
ln -sf Name-stats.tab Name-stats
tr '\t' , <Name-stats.tab >Name-stats.csv
abyss-tabtomd Name-stats.tab >Name-stats.md

But I still see no error message of why is stops at "unitigs.fa"

ADD REPLYlink written 8 months ago by queentakesjackfox0

indeed, but I also don't see any output of the abyss-map step ???

can you run the exact same command but with the -n option (==dry-run, will just print out the cmds but not execute them) and provide the output.

ADD REPLYlink modified 8 months ago • written 8 months ago by lieven.sterck8.7k

Running the following:

abyss-pe -n v=-vv name="Name" k=96 se="Merge.fastq read1.fastq read2.fastq"
make: Nothing to be done for 'default'.
ADD REPLYlink written 8 months ago by queentakesjackfox0

ah, yes that can be.

you need to run it with a different name (it will check the existing data otherwise and notice it has all done)

ADD REPLYlink written 8 months ago by lieven.sterck8.7k

Ah, I see. Thank you!

Running with a new name:

abyss-pe -n v=-vv name="NewName" k=96 se="I29687_M.fastq I29687_U1.fastq I29687_U2.fastq"

Here is the output:

ABYSS -k96 -q3 -vv   --coverage-hist=coverage.hist -s NewName-bubbles.fa  -o NewName-1.fa  I29687_M.fastq I29687_U1.fastq I29687_U2.fastq
    AdjList -vv   -k96 -m0 --dot NewName-1.fa >NewName-1.dot
    abyss-filtergraph -vv --dot   -k96 -g NewName-2.dot1 NewName-1.dot NewName-1.fa >NewName-1.path
    MergeContigs --dot -vv  -k96 -g NewName-2.dot -o NewName-2.fa NewName-1.fa NewName-2.dot1 NewName-1.path
    PopBubbles -vv --dot -j2 -k96  -p0.9  -g NewName-3.dot NewName-2.fa NewName-2.dot >NewName-2.path
    MergeContigs -vv  -k96 -o NewName-3.fa NewName-2.fa NewName-2.dot NewName-2.path
    awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
            NewName-2.path NewName-1.fa >NewName-indel.fa
    ln -sf NewName-3.fa NewName-unitigs.fa
    abyss-fac   NewName-unitigs.fa |tee NewName-stats.tab
    ln -sf NewName-stats.tab NewName-stats
tr '\t' , <NewName-stats.tab >NewName-stats.csv
abyss-tabtomd NewName-stats.tab >NewName-stats.md
[1]+  Killed                  ./velvetg Results_Paired/  (wd: /pool/GenomeAssembly/velvet)
(wd now: /pool/GenomeAssembly/abyss)
ADD REPLYlink written 8 months ago by queentakesjackfox0

that's the one indeed .

few weird things though:

  • what's the "Killed" statement doing there? is it part of the abyss run of something you did (copied?)
  • it does not say anything about going from unitigs to contigs, there should be something like abyss-map or such be mentioned in this output
ADD REPLYlink written 8 months ago by lieven.sterck8.7k
1
gravatar for lieven.sterck
8 months ago by
lieven.sterck8.7k
VIB, Ghent, Belgium
lieven.sterck8.7k wrote:

ok, coming to think of it, you're running it with s= as input, I think in this case the unitigs are the final product as the way to go from unitigs to contigs is to map the PE reads to join them, but since you do not have those (== in your cmdline input), it could be abyss stops on the unitigs stage ?

you could try the following: the command line you executed here above and add " contigs" behind it (==> tell abyss to stop on the contigs stage), if it then tells you "nothing to do" or such it means it considers the run done

ADD COMMENTlink modified 8 months ago • written 8 months ago by lieven.sterck8.7k

New output with "contigs" appended:

abyss-pe -n v=-vv name="Name" k=96 se="I29687_M.fastq I29687_U1.fastq I29687_U2.fastq" "contigs"
ABYSS -k96 -q3 -vv   --coverage-hist=coverage.hist -s Name-bubbles.fa  -o Name-1.fa  I29687_M.fastq I29687_U1.fastq I29687_U2.fastq
AdjList -vv   -k96 -m0 --dot Name-1.fa >Name-1.dot
abyss-filtergraph -vv --dot   -k96 -g Name-2.dot1 Name-1.dot Name-1.fa >Name-1.path
MergeContigs --dot -vv  -k96 -g Name-2.dot -o Name-2.fa Name-1.fa Name-2.dot1 Name-1.path
PopBubbles -vv --dot -j2 -k96  -p0.9  -g Name-3.dot Name-2.fa Name-2.dot >Name-2.path
MergeContigs -vv  -k96 -o Name-3.fa Name-2.fa Name-2.dot Name-2.path
awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
        Name-2.path Name-1.fa >Name-indel.fa
abyss-todot -vv --dist -e Name-3.fa >Name-3.dist
Overlap -vv --dot   -k96 -g Name-4.dot -o Name-4.fa Name-3.fa Name-3.dot Name-3.dist
abyss-stack-size 65536 SimpleGraph -vv  -s1000 -n10  -j2 -k96 -o Name-4.path1 Name-4.dot Name-3.dist
abyss-index -vv --fai Name-3.fa
abyss-index -vv --fai Name-4.fa
cat Name-3.fa.fai Name-4.fa.fai \
        | MergePaths -vv  -j2 -k96 -s1000  -o Name-4.path2 - Name-4.path1
PathOverlap --assemble -vv  -k96  Name-4.dot Name-4.path2 >Name-4.path3
cat Name-3.fa Name-4.fa \
        | abyss-stack-size 65536 PathConsensus -vv --dot -k96  -p0.9  -o Name-5.path -s Name-5.fa -g Name-5.dot - Name-4.dot Name-4.path3
cat Name-3.fa Name-4.fa Name-5.fa | MergeContigs -vv  -k96 -o Name-6.fa - Name-5.dot Name-5.path
ln -sf Name-6.fa Name-contigs.fa

Based on the fact that the dry run finally shows a Name-contigs.fa, I've decided to run the following command:

abyss-pe v=-vv name="Dytsicidae_Merged_2020_02_10" k=96 se="I29687_M.fastq I29687_U1.fastq I29687_U2.fastq" "contigs"
ADD REPLYlink written 8 months ago by queentakesjackfox0

indeed, give that a try

(you actually don't have to quote the contigs part, it serves as a make target)

ADD REPLYlink written 8 months ago by lieven.sterck8.7k

Ah, yes, thank you! And it actually worked perfectly, thank you very much!

ADD REPLYlink written 8 months ago by queentakesjackfox0

Good to hear !

A small educational note: if an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. (and you can accept multiple answers if need-be)

Upvote|Bookmark|Accept

ADD REPLYlink written 8 months ago by lieven.sterck8.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2073 users visited in the last hour