I have downloaded Illumina reads from SRA (SRR769347). I want to run a de novo assembly with SPAdes but I am getting an error and I am not sure how to fix it.
Here are all the steps I've performed:
fastq-dump --split-files on the SRA file to extract the 2 files for each paired reads. I used
sed to add
/2 respectively to each paired files. I therefore have 2 files:
A read from
file1.fastq looks like this:
@SRR769347.1/1 1 length=101 CCTGTGTGATAAAATTGGAAAACAAATTCAAACTGATACTGAAATCAAAGCTGCCGTGTTTGACTTAAAACAAATGCTTGATCACTAGCAGAAAAACCTGA +SRR769347.1 1 length=101 @@@FFFDDAFDHHJIJIJGGIIIIGCHHIIGHIGIGCGHGIJFEE>GHHGEIJIEE@FFHIIGJIJCC@GGIFHHFEHFBDDDFECCC9>CDDD?AB@DB9
A read from
file2.fastq looks like this:
@SRR769347.1/2 1 length=101 TGAATACCTTCTTTTTTAGCAAAAAATTGAATGTCATCCACAAGTAATAAGTCACAGGTGCGATACTTTTCTCTGAATTCATCCTGAGTTTTATTTTTGAT +SRR769347.1 1 length=101 CC@FFDFFHHHHHIJJJJJIJIIIGIGHIIJJJJIIIIJEIGCHBFHIGGGGGGGGIG@FHGGGEHHFFFFFFFEAEEECCADDDC5;>CCAACCCDCDBA
I then use these files as input to SPAdes 3.1.0 (available version on our cluster) as follows : (I also provide PacBio reads for the assembly)
spades.py -1 file1.fastq -2 file2.fastq --pacbio pacbio.fastq -o SPAdes_output
Invariably I am getting the following error:
== Error == file not found: file2.fastq (right reads, library number: 1, library type: paired-end)
I also tried to use another
--split3 for which the reads are correctly labelled as
/2 but strangely each read is in multiple copies... and also gives me the same error in SPAdes...
Any help would be great!