Hi,
I have downloaded Illumina reads from SRA (SRR769347). I want to run a de novo assembly with SPAdes but I am getting an error and I am not sure how to fix it.
Here are all the steps I've performed:
I used fastq-dump --split-files
on the SRA file to extract the 2 files for each paired reads. I used sed
to add /1
and /2
respectively to each paired files. I therefore have 2 files: file1.fastq
and file2.fastq
A read from file1.fastq
looks like this:
@SRR769347.1/1 1 length=101
CCTGTGTGATAAAATTGGAAAACAAATTCAAACTGATACTGAAATCAAAGCTGCCGTGTTTGACTTAAAACAAATGCTTGATCACTAGCAGAAAAACCTGA
+SRR769347.1 1 length=101
@@@FFFDDAFDHHJIJIJGGIIIIGCHHIIGHIGIGCGHGIJFEE>GHHGEIJIEE@FFHIIGJIJCC@GGIFHHFEHFBDDDFECCC9>CDDD?AB@DB9
A read from file2.fastq
looks like this:
@SRR769347.1/2 1 length=101
TGAATACCTTCTTTTTTAGCAAAAAATTGAATGTCATCCACAAGTAATAAGTCACAGGTGCGATACTTTTCTCTGAATTCATCCTGAGTTTTATTTTTGAT
+SRR769347.1 1 length=101
CC@FFDFFHHHHHIJJJJJIJIIIGIGHIIJJJJIIIIJEIGCHBFHIGGGGGGGGIG@FHGGGEHHFFFFFFFEAEEECCADDDC5;>CCAACCCDCDBA
I then use these files as input to SPAdes 3.1.0 (available version on our cluster) as follows : (I also provide PacBio reads for the assembly)
spades.py -1 file1.fastq -2 file2.fastq --pacbio pacbio.fastq -o SPAdes_output
Invariably I am getting the following error:
== Error == file not found: file2.fastq (right reads, library number: 1, library type: paired-end)
I also tried to use another fastq-dump
option: --split3
for which the reads are correctly labelled as /1
and /2
but strangely each read is in multiple copies... and also gives me the same error in SPAdes...
Any help would be great!
thanks
Julien
Try to download fastq files from ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA068/SRA068445/SRX247326/
Also use full path to fastq files, e.g.
thanks, it was indeed a PATH issue. It works now ! Sorry about that !