seed file link: ERR322446_seed.fasta
I used the first read sequence from ERR322446.filtered.A.fq as the seed in NOVOPlasty. However, the program returns the following error:
I have checked that the seed sequence is in FASTA format with a proper header and matches the first read exactly. My NOVOPlasty configuration file is config.txt.
I used the first read sequence from ERR322446.filtered.A.fq as the seed
ERR322446 appears to be a human sample.
Did you convert the fastq header into a fasta one e.g. >seed_sequence when you provided it seed sequence? If that is not working you could use any other human mitochondrial sequence as noted in the NOVOPlasty manual.
I've seen this "INVALID SEED" error crop up a few times with NOVOPlasty—it's frustrating but usually points to the seed not anchoring properly in your read dataset (i.e., the algorithm can't find sufficient k-mer overlaps or extensions to build from it). Since you've confirmed the FASTA format and exact sequence match, the culprit is almost certainly that your chosen seed (the first read from ERR322446.filtered.A.fq) isn't from the mitochondrial genome—it's likely a nuclear or off-target read, so NOVOPlasty can't extend it into a circular organelle assembly.
ERR322446 looks like a human sample (based on SRA accession patterns and the comment above), so mitochondrial reads should be present but potentially sparse after filtering (mtDNA often has uneven coverage in WGS data). Here's a step-by-step way to troubleshoot and get a valid seed:
1. Quick Check: Does Your Dataset Even Have mtDNA Reads?
Before tweaking the seed, verify there are mitochondrial reads in your filtered FASTQs. Map a subset to the human mtDNA reference (rCRS, NC_012920.1):
Download the reference: wget https://www.ncbi.nlm.nih.gov/sra/?term=NC_012920[accn] -O human_mt_ref.fasta (or grab it directly from NCBI Nucleotide).
Index it (using bowtie2, e.g.): bowtie2-build human_mt_ref.fasta human_mt_index
Map a subsample (to save time): seqtk sample -s100 ERR322446.filtered.A.fq 100000 | bowtie2 -x human_mt_index -U - -S mapped.sam --no-unal
(Adjust subsample size as needed; check for alignments in the SAM.)
If you get zero or very few hits (<0.1% of reads), your filtering might have culled the mt reads—try loosening quality thresholds or using unfiltered data. Human mtDNA typically maps at 100-1000x coverage in WGS, so it should be detectable.
2. Generate a Reliable Seed Using the Reference
The easiest fix for human data: Use the full human mtDNA reference as your seed. NOVOPlasty handles this well, even for closely related samples, and it kickstarts the extension from a strong anchor.
Save human_mt_ref.fasta (from step 1) as your new seed file (e.g., human_mt_seed.fasta).
Update your config.txt:
Project name = ERR322446_mito
Type = mito
Genome range = 15000 17000
K-mer = 21 (or 25; start low if AT-rich)
Seed Input = human_mt_seed.fasta
Dataset 1 = ERR322446.filtered.A.fq ERR322446.filtered.B.fq
Data type = Paired-end Illumina
Read Length = 100 (estimate based on your data)
Insert size = 300 (auto-detect if unsure)
Insert range = 1.9
Insert range strict = 1.3
Use Quality Scores = no
Low coverage threshold = 2
(Tweak based on your original config—e.g., if it's already set for paired-end, keep that. Run with --config config.txt.)
Rerun: NovoPlasty.pl --config config.txt
This should bypass the error entirely. If it still fails, bump the k-mer to 39 or try a conserved gene fragment (e.g., COI from human mtDNA: extract ~600bp from the ref using samtools faidx or a text editor).
3. Alternative: Mine a Real mt Read as Seed
If you want to stick with a single-read seed:
From the mapping in step 1, extract a uniquely mapped read: samtools view -F 4 mapped.sam | head -1 | cut -f10 (gets the sequence).
Convert to FASTA: >mt_seed\n[sequence] > mt_seed.fasta.
Use that in your config. It guarantees the seed exists in your data.
4. Other Common Pitfalls to Double-Check
K-mer vs. Seed Length: Ensure your seed length > k-mer size (e.g., if k=39, seed should be >39 bp). Your read is likely fine, but confirm.
Orientation: If paired-end, make sure A.fq/B.fq are correctly forward/reverse.
Subsampling: If your dataset is huge, NOVOPlasty might subsample too aggressively—set Subsampling fraction = 100% in optional params.
Version: Use the latest NOVOPlasty (4.3.1 as of now) from GitHub—older versions had seed bugs.
Log Details: Your log likely shows "Retrieve Seed...BUILD2" failure right after hash table build. If it mentions "no seed k-mers found," that's confirmatory.
ERR322446appears to be a human sample.Did you convert the fastq header into a fasta one e.g.
>seed_sequencewhen you provided it seed sequence? If that is not working you could use any other human mitochondrial sequence as noted in the NOVOPlasty manual.