Question: Abyss-fixmate: error: all reads are mateless
1
gravatar for jozs2019
2.5 years ago by
jozs201910
jozs201910 wrote:

Hello,

I'm trying to assemble contigs from a set of paired end reads, denoted as SRR960028_1.fasta and SRR960028_2.fasta. I'm running ABySS on my university's HPC facility. When I run ABySS it terminates early. The line of code itself (within the PBS file) is:

abyss-pe name=abyss_test1 k=63 in='SRR960028_1.fastq SRR960028_2.fastq' v=-v

The tail of the error file looks like this:

Mapped 272979576 of 273907308 reads (99.7%)
Mapped 247491841 of 273907308 reads uniquely (90.4%)
Read 273907308 alignments

Mateless   273907308  100%
Unaligned          0
Singleton          0
FR                 0
RF                 0
FF                 0
Different          0
Total      273907308

abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.
error: 'abyss_test1-3.hist': No such file or directory
make: *** [abyss_test1-3.dist] Error 1
make: *** Deleting file `abyss_test1-3.dist'

I've seen the abyss-fixmate error pop up on threads here before, but most of threads about the error seem to have 0 reads in the "Mateless" or the "Total" read section, whereas I have a number. I've also opened the .fasta files and they definitely contain reads. I've seen a few threads that recommend denoting the lines within the .fasta files with a /1 or a /2, but I was under the impression that denoting the files themselves as reads1.fa and reads2.fa would suffice for ABySS (or at least according to the ABySS manual, unless I'm incorrect).

The only thing in the output file is this:

abyss-map -v  -j40 -l40    SRR960028_1.fastq SRR960028_2.fastq abyss_test1-3.fa \
                |abyss-fixmate -v  -l40  -h abyss_test1-3.hist \
                |sort -snk3 -k4 \
                |DistanceEst -v  -j40 -k63 -l40 -s1000 -n10   -o abyss_test1-3.dist abyss_test1-3.hist

The output files generated by ABySS from the run included abyss_test1-1.fa, abyss_test1-2.fa, abyss_test1-3.fa and a abyss_test1-unitigs.fa file. I've checked the head and tail of the files, and they appear to contain contigs.

I'm reluctant to use these files for any analysis because I'm not sure how ABySS assembled them - does anybody know how ABySS assembled them?

Does anybody have any clue as to why ABySS is terminating early, and how I can fix it?

Thanks in advance!

ADD COMMENTlink modified 2.5 years ago by benv710 • written 2.5 years ago by jozs201910
2
gravatar for benv
2.5 years ago by
benv710
Canada
benv710 wrote:

Hi @jozs2019,

It is most likely a read naming issue that is preventing the reads from being properly paired by ABySS. (I can confirm this if you post the first 10 lines of your read 1 and read 2 files. You can use a gist if you like.)

ABySS requires that the FASTQ IDs (i.e. the first whitespace separated word of the lines beginning with @) for reads 1 and read 2 are either identical or have an identical prefix followed by /1 and /2. (See https://github.com/bcgsc/abyss/wiki/ABySS-Users-FAQ#4-my-abyss-assembly-fails-and-i-get-an-error-that-says-abyss-fixmate-error-all-reads-are-mateless-this-can-happen-when-first-and-second-read-ids-do-not-match).

Your results up to test-3.fa should be fine, because those first steps of the pipeline don't make any use of the read pairing information. But for the sake of cleanliness and reproducibility, you may want to do a complete rerun of the pipeline after fixing the read IDs.

ADD COMMENTlink written 2.5 years ago by benv710

Thanks for clarifying, benv!

The first 10 lines of read 1 file:

@SRR960028.1.1 FCC0H3RACXX:5:1101:1479:2095 length=150
TNAGTGAAGGACACCGCTGGATAAAGAATAAAAGGAGATCTTGACAAAAAGAAAAAAGAGGACCTTCAAAAGTAAGGCAAGAAGAGGAGTGGTGAGTTTTCTTATAAATATTAAAAATGGGAGCTAGCAACAGTCGCTTGCCTTTTGTAT
+SRR960028.1.1 FCC0H3RACXX:5:1101:1479:2095 length=150
@%1=B;DDHHBHFIIIIIIICCGHIGGIIIGIGEH>??+BGIEDEICFGDFCAHIIHHHFFEFCAEEEECCC;@CCCCCCCCCCCBBBB+88+8@<4>CCCCCCCCDECDDEACCCCBCC?A89A>:>A:ABB:>>@<<@BCC@>CA8AC
@SRR960028.2.1 FCC0H3RACXX:5:1101:1388:2095 length=150
TNGTGATGAGATTGCTGTAAAAAAGGACAACATTCTTGTTCGATCTTTCAAAGATGGAAAATTGTAAGTATTATTAATTTAACCTGGAAAATTTCTCTATAAAATGTATGTGCTTTTTTCCCCATACTAGGAAAGTAAATGATCAATTAG
+SRR960028.2.1 FCC0H3RACXX:5:1101:1388:2095 length=150
@%4=BDDEHHHHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIHIIIHIIGIIIIIIGIIDIICGHGCEHHHHGFGFFFF@?BEEDEEDCCCDCDDDEDEACCCC@DDEDDDDDCDCB>@>CDDDCC>@?C>)<@A@CCCCDCCCCCCC
@SRR960028.3.1 FCC0H3RACXX:5:1101:1361:2113 length=150
CTCCGGAATCCTCAGCCTAACCCTAACCNTNTNCNNGNNNNNCNNNGNGNGGNNNNTNTTCCTACATCATTCCTACAACTGAAGGTCCAATGCACATGAAAATATGCATTACACTACACACAGAGTCAACTGGGAAAATCACTGCATTCC

And the first 10 reads of the second file:

@SRR960028.1.2 FCC0H3RACXX:5:1101:1479:2095 length=150
TCCGGCGNNNNNNNNNNNNNNATACTTTTCACTTTCAATTTTACAATTATCATTCTCAGGTTCCTCTACACGCAATAAACTTTTAAACATATATACAAAAGGCAAGCGACTGTTGCTAGCTCCCATTTTTAATATTTATAAGAAAACTCA
+SRR960028.1.2 FCC0H3RACXX:5:1101:1479:2095 length=150
@@@FFDF%%%%%%%%%%%%%%11:8?FHIGIEGGIGEHFCHIIIIIIIIIGIIIGIIID97=EHHEHHGHFEFDBBDCCACDDDD=CDDDDEEDFEEDCDCDBDBDDDDDDD@@CDCEDDDC@?<>CA@C<>:@CACAB3:3)439?AAC
@SRR960028.2.2 FCC0H3RACXX:5:1101:1388:2095 length=150
ACACTTANNNNNNNNNNNNNNAATAGTTCTCTGAAACATTCTGTGAGATGAAAAATAAAAATCCACGGTCATTCAAAAAAACCTAATTGATCATTTACTTTCCTAGTATGGGGAAAAAAGCACATACATTTTATAGAGAAATTTTCCAGG
+SRR960028.2.2 FCC0H3RACXX:5:1101:1388:2095 length=150
CCCFFFF%%%%%%%%%%%%%%22@FHFIIIIIIIIIIIIIIIIIIIGIIIIGIIIIIIIIIIIIIIIGAEDEFFFFFEDDDDDDDDDDCDEDEDEEADCDCCDDDDCDDDDDB?BB?A>BBCDCDDC:C3:@CCD>>CC99))43:>>CC
@SRR960028.3.2 FCC0H3RACXX:5:1101:1361:2113 length=150
TGTGAAGNNNNNNNNNNNNGTGGTTTTCCACGGACGTACGCCTTGATGCCTGGATAGAATAGGTAGCGATTGGCGACAGAGTAGTGTTAGAGCTAGATTTAGGGAATGCAGTGATTTTCCCAG
ADD REPLYlink written 2.5 years ago by jozs201910

I see. Those read IDs do not follow the rules I described above. For example, the read IDs for the first pair of reads need to be either:

@SRR960028.1 (read 1 file) and @SRR960028.1 (read 2 file)

OR

@SRR960028.1/1 (read 1 file) and @SRR960028.1/2 (read 2 file)

You will have to fix the IDs yourself with a unix script (e.g. sed, awk, perl, python).

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by benv710

Understood! I wrote this sed script:

sed 's|.1 |/1 |g' <reads_1.fastq >reads_suffixes_1.fastq

... and it seems to have worked (checking the head and tail again). Fingers crossed that assembly goes better this time, and thanks for your help!

ADD REPLYlink written 2.5 years ago by jozs201910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1925 users visited in the last hour