NOVOPlasty assembly fails with “INVALID SEED” using first read as seed
1
0
Entering edit mode
7 weeks ago
Buddha • 0

Hello,

I am trying to assemble a complete mitochondrial genome using NOVOPlasty from filtered paired-end FASTQ files. The files I am using are:

ERR322446.filtered.A.fq

ERR322446.filtered.B.fq

seed file link: ERR322446_seed.fasta I used the first read sequence from ERR322446.filtered.A.fq as the seed in NOVOPlasty. However, the program returns the following error:

INVALID SEED, PLEASE TRY AGAIN WITH A NEW ONE

NovoPlasty Output log: log fle

I have checked that the seed sequence is in FASTA format with a proper header and matches the first read exactly. My NOVOPlasty configuration file is config.txt.

Access to all files: BioStar_ERR322446

I’m stuck and not sure what I’m missing. Has anyone encountered this issue or can suggest a reliable way to generate a valid seed for NOVOPlasty?

Thank you in advance!

mitochondria Assembly Genome Novoplasty • 789 views
ADD COMMENT
0
Entering edit mode

I used the first read sequence from ERR322446.filtered.A.fq as the seed

ERR322446 appears to be a human sample.

Did you convert the fastq header into a fasta one e.g. >seed_sequence when you provided it seed sequence? If that is not working you could use any other human mitochondrial sequence as noted in the NOVOPlasty manual.

ADD REPLY
2
Entering edit mode
22 days ago
Kevin Blighe ★ 90k

Hello,

I've seen this "INVALID SEED" error crop up a few times with NOVOPlasty—it's frustrating but usually points to the seed not anchoring properly in your read dataset (i.e., the algorithm can't find sufficient k-mer overlaps or extensions to build from it). Since you've confirmed the FASTA format and exact sequence match, the culprit is almost certainly that your chosen seed (the first read from ERR322446.filtered.A.fq) isn't from the mitochondrial genome—it's likely a nuclear or off-target read, so NOVOPlasty can't extend it into a circular organelle assembly.

ERR322446 looks like a human sample (based on SRA accession patterns and the comment above), so mitochondrial reads should be present but potentially sparse after filtering (mtDNA often has uneven coverage in WGS data). Here's a step-by-step way to troubleshoot and get a valid seed:

1. Quick Check: Does Your Dataset Even Have mtDNA Reads?

Before tweaking the seed, verify there are mitochondrial reads in your filtered FASTQs. Map a subset to the human mtDNA reference (rCRS, NC_012920.1):

  • Download the reference:
    wget https://www.ncbi.nlm.nih.gov/sra/?term=NC_012920[accn] -O human_mt_ref.fasta (or grab it directly from NCBI Nucleotide).

  • Index it (using bowtie2, e.g.):
    bowtie2-build human_mt_ref.fasta human_mt_index

  • Map a subsample (to save time):
    seqtk sample -s100 ERR322446.filtered.A.fq 100000 | bowtie2 -x human_mt_index -U - -S mapped.sam --no-unal
    (Adjust subsample size as needed; check for alignments in the SAM.)

    If you get zero or very few hits (<0.1% of reads), your filtering might have culled the mt reads—try loosening quality thresholds or using unfiltered data. Human mtDNA typically maps at 100-1000x coverage in WGS, so it should be detectable.

2. Generate a Reliable Seed Using the Reference

The easiest fix for human data: Use the full human mtDNA reference as your seed. NOVOPlasty handles this well, even for closely related samples, and it kickstarts the extension from a strong anchor.

  • Save human_mt_ref.fasta (from step 1) as your new seed file (e.g., human_mt_seed.fasta).
  • Update your config.txt:

    Project name = ERR322446_mito
    Type = mito
    Genome range = 15000 17000
    K-mer = 21  (or 25; start low if AT-rich)
    Seed Input = human_mt_seed.fasta
    Dataset 1 = ERR322446.filtered.A.fq ERR322446.filtered.B.fq
    Data type = Paired-end Illumina
    Read Length = 100  (estimate based on your data)
    Insert size = 300  (auto-detect if unsure)
    Insert range = 1.9
    Insert range strict = 1.3
    Use Quality Scores = no
    Low coverage threshold = 2
    

    (Tweak based on your original config—e.g., if it's already set for paired-end, keep that. Run with --config config.txt.)

  • Rerun: NovoPlasty.pl --config config.txt

    This should bypass the error entirely. If it still fails, bump the k-mer to 39 or try a conserved gene fragment (e.g., COI from human mtDNA: extract ~600bp from the ref using samtools faidx or a text editor).

3. Alternative: Mine a Real mt Read as Seed

If you want to stick with a single-read seed:

  • From the mapping in step 1, extract a uniquely mapped read:
    samtools view -F 4 mapped.sam | head -1 | cut -f10 (gets the sequence).
    Convert to FASTA: >mt_seed\n[sequence] > mt_seed.fasta.
  • Use that in your config. It guarantees the seed exists in your data.

4. Other Common Pitfalls to Double-Check

  • K-mer vs. Seed Length: Ensure your seed length > k-mer size (e.g., if k=39, seed should be >39 bp). Your read is likely fine, but confirm.
  • Orientation: If paired-end, make sure A.fq/B.fq are correctly forward/reverse.
  • Subsampling: If your dataset is huge, NOVOPlasty might subsample too aggressively—set Subsampling fraction = 100% in optional params.
  • Version: Use the latest NOVOPlasty (4.3.1 as of now) from GitHub—older versions had seed bugs.
  • Log Details: Your log likely shows "Retrieve Seed...BUILD2" failure right after hash table build. If it mentions "no seed k-mers found," that's confirmatory.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6