Hello! I've got to run abyss-pe on 2 files I've got and find the best parameters (k and so on) that would create the best contig coverage (sorry if I'm messing things up, I'm a programmer that had a short bioinformatics crash course). The files I've got are paired end reads of the Clorella Variabilis chloroplast and each take 304MB.
This is my run command
abyss-pe k=25 n=10 v=-v c=118 e=51 name=test in='reads1.fastq reads2.fastq'
Since I run the program on verbose and see a memory load I think that this is not a RAM issue (I have 16GB of ram on the system). I've tried running the fastQValidator tool and it said the files seem to be invalid, however my lecturer assures me they are valid and in fact paired ends. The files are in FASTQ format (I guess created on a Illumina/Sanger machine) so they contain some strange characters like scopes and non-ACGT characters in the quality strings and the sequence it self.
this is where the progress stops:
Mapped 1432486 of 1481614 reads (96.7%)
Mapped 1430166 of 1481614 reads uniquely (96.5%)
Read 1481614 alignments
Mateless 1481614 100%
Unaligned 0
Singleton 0
FR 0
RF 0
FF 0
Different 0
Total 1481614
abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.
error: `test-3.hist': No such file or directory
The link I'm submitting contains the whole output during the runtime of abyss-pe in verbose mode.
I've tried all I could (or know for that matter - since it is a crash course for software engineers I took as a diversity course during my BSc), I'd appreciate some help. Thanks ahead!
My files are already named as you've suggested, reads1.fastq and reads2.fastq
Edit: I might have misunderstood you. You mean that all identifiers in the file should have those prefixes.
As I understand (from the files) they already have the read identifiers thought I'm not sure if that's correct.
This is a the first 4 lines in the read1.fastq file (they include as I understand the identifier line, the the sequence, the identifier again and the quality string)
these are the corresponding lines in reads2.fastq:
Does this seem correct?
The read identifiers in your files are the lines starting with @, so in your case,
and
As you can see, the suffixes of these are not 1, 2, or R1, R2, etc. - so you either have to cut off the
:N:0:NTTTCG
or add "1" to all lines starting with @ in reads1.fq and "2" to all lines starting with @ in reads2.fq using python, perl, sed or grep etc.Unfortunately that didn't help as well.
Just to be on the save side of things:
reads1.fastq
reads2.fastq
I've removed all the suffixes you've mentioned.
Hm, lhat looks correct - could you please try one more thing? This page on the Abyss homepage has some test input data, direct link. Could you compare the test data with yours and see whether your Abyss installation can handle the test data?
Some differences I see that might not be documented:
Please write this down as an answer since it solved my problem (adding /1 and /2 to the corresponding read files)
the first thing I did after compiling ABySS on my machine was to get the test data and run it. It does run up to the end and present the table with contigs coverage etc' so I presume that ABySS it self is working fine. I have a real feeling that the data provided to me is not in a standard format and ABySS can not handle it. I'll try putting /1 and /2 in the suffix instead of only 1 and 2 About the + identifier line, I'll research how to remove those... I hope some regex could do the trick.