Entering edit mode
5 months ago
Grace
•
0
I am trying to run spades on a multi-fastq file of short sequences, but it seems to be stuck in one spot the whole night. This is what the spades log has been the entire night, and has not changed yet:
I used this command to run spades:
time spades.py --s1 phage_blast_aligns.fastq --phred-offset 33 -o spades_phage_output
I also wanted to note that I generated the fastq file from a fasta file, and I gave all of the bases a fake quality score of "I"
Is it normal for spades to be stuck in one spot like this for so long? How long should spades take on a 6 MB file normally?
What can I do to get spades working on my fastq file?
Are there 2146 reads in this file? Where did the reads originate from? Are they expected to assemble? How long are the reads?
Yes, there are 2146 reads in the file.
I ran a lambda phage genome in a database and got a series of genomes as hits. I then ran these genomes in a blast aligner, extracted the alignment sequences, and put them in fastq format.
I am using these alignments as a test to see if assembling these alignments will result in giving us back the lambda phage genome. So yes, I do roughly expect them to assemble.
What do you think is the issue with spades in this case?
You shouldn't mix more than one genome in a file if you are trying to test to see if you are able to assemble the genome. You can create fake reads from one genome (bbfakereads.sh from BBMap suite or other tools) and then use those reads with SPAdes.
I don't understand. So spades is not able to assemble multiple sequences if they are from different genomes? All of my sequences are relatively short, maybe 100-200 bp long.
Why would spades not be able to assemble sequences from different genomes, if hypothetically they should all be segments of the lambda phage genome?
We don't know what exactly you did to create the fragments. Did you start with a genome and then "shred" it or did you randomly pick out fragments from blast search?
SPAdes should be able to assemble the sequences but it won't be creating multiple genomes. If you did not create the fragments by "shredding" the genome to create complete coverage then you may have uneven coverage/parts of genome may be missing. Sometimes having an oversampling of the data can also confuse the assembler.
What theoretical fold coverage are you using with those reads (total number bases in data/lambda genome size).