Question: Snakemake input error with files from inconsistent naming scheme
0
gravatar for skbrimer
11 weeks ago by
skbrimer650
United States
skbrimer650 wrote:

Hello hive brain,

I am trying to make a workflow in snakemake to process some MinION reads. they are cDNA amplicons of different genotypes of the same virus that were multiplex together. To make sure that I am only using the barcodes I want to be using I am pre-processing the reads with porechop to strictly get reads with both barcode adaptors and then moving forward. However since porechop relabels the reads as BC01, BC02, etc.. I have added a "barcodes section to the config.yaml file but I am having trouble getting pas t this error.

MissingInputException in line 17 of /home/sean/Desktop/reo/antisera project/20200813/MinIONAmplicon.smk:
Missing input files for rule minimap2:
8413_19_strict/BC01.fastq.gz

I know what is is telling me however the rule in my workflow right before is make that directory so I'am not sure why it is not trying to run all the jobs.

Any help is greatly appreciated!

Here is my SnakeFile

configfile: "config.yaml"

rule all:
    input:
        expand("{sample}.bam", sample = config["samples"])

rule porechop_strict:
    input:
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        "{sample}_strict/"
    shell:
        "porechop -i {input} -b {output} --barcode_threshold 85 --threads 8 --require_two_barcodes"

rule minimap2:
    input:
        lambda wildcards: "{sample}_strict/" + config["barcodes"][wildcards.sample]
    output:
        "{sample}.bam"
    shell:
        "minimap2 -ax map-ont -t8 ../concensus.fasta {input} | samtools sort -o {output}"

and my config file

samples: {
  '8413_19': relabeled_reads/8413_19.raw.fastq.gz,
  '8417_19': relabeled_reads/8417_19.raw.fastq.gz,
  '8445_19': relabeled_reads/8445_19.raw.fastq.gz,
  '8466_19_104': relabeled_reads/8466_19_104.raw.fastq.gz,
  '8466_19_105': relabeled_reads/8466_19_105.raw.fastq.gz,
  '8467_20': relabeled_reads/8467_20.raw.fastq.gz,
  }
barcodes: {
      '8413_19': BC01.fastq.gz,
      '8417_19': BC02.fastq.gz,
      '8445_19': BC03.fastq.gz,
      '8466_19_104': BC04.fastq.gz,
      '8466_19_105': BC05.fastq.gz,
      '8467_20': BC06.fastq.gz,
    }
nanopore snakemake minion cdna • 165 views
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by skbrimer650
1
gravatar for skbrimer
11 weeks ago by
skbrimer650
United States
skbrimer650 wrote:

So here is the solution I figured out.

rule minimap2:
    input:
        "{sample}_strict"
    params:
        suffix=lambda wildcards: config["barcodes"][wildcards.sample]
    output:
        "{sample}.bam"
    shell:
        "minimap2 -ax map-ont -t8 ../consensus.fasta\
         {input}/{params.suffix} | samtools sort -o {output}"

I am not sure why it has to runthis way and I am sure it has to do how snakemake figures out what it needs to still create, however I found that I could use the params feature to match the barcode output from porechop and then the input is the same as the output from the previous rule and now it runs as I want.

ADD COMMENTlink written 11 weeks ago by skbrimer650
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1187 users visited in the last hour