Question: Using snakemake with multiple output for calling SNP using clairvoyante
0
gravatar for Medhat
13 months ago by
Medhat8.6k
Texas
Medhat8.6k wrote:

I am trying to run Clairvoyante using snakemake for multiple chromosome. the script looks like this:

chr_list = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY', 'chrM']

rule clairvoyante:
    """
    Identify SNPs using Clairvoyante.
    """
    input:
        datain="{dataset}/{sample}.{ref}.{align}.{case}.bam"
    output:
        dataout= ["{{dataset}}/{{sample}}.{{ref}}.{{align}}.{{case}}.clarivoyante.{}.vcf".format(i) for i in chr_list]
    params:
        reference=my_ref,
        training=TRAINING_DATA,
        clairvoyante=CLARIVOYANTE,
        minCoverage=1,
    message:
        "Running callvarbam from clairvoyante, sample is: {wildcards.sample}"
    shell:
        """
            clairvoyante.py callVarBam \
            --chkpnt_fn "{params.training}" \
            --bam_fn "{input.datain}" \
            --ref_fn "{params.reference}" \
            --call_fn "{output.dataout}" \
            --minCoverage "{params.minCoverage}" \
            --ctgName 1  \
        """

My issue is the tool calls variant on one chromosome only per time using --ctgName parameter: the easy way is to loop through the chr_list in shell and uses parallel or xargs to call snp on all of them ()I think this is not a convenient way) or maybe write a python function and use map to run them in parallel.

but I think there should be a better way to do that using snakemake to pass the chromosome to --ctgName for each output.

any idea?

snp snakemake • 1.1k views
ADD COMMENTlink modified 13 months ago • written 13 months ago by Medhat8.6k
1
gravatar for WouterDeCoster
13 months ago by
Belgium
WouterDeCoster42k wrote:

FYI: AFAIK the preferred support forum for snakemake is on StackOverflow.

That said, you'll need another function which is going to cat the vcf's together, and which has as input all separate vcf's like:

expand("myvariants_{{sample}}-{chromosome}.vcf", chromosome=chr_list)

You would then adapt your clairvoyante rule to write output in the form of "myvariants_{{sample}}-{chromosome}.vcf".

ADD COMMENTlink written 13 months ago by WouterDeCoster42k

Thanks, I will try your answer. Regarding the question I was doubting where to put it, but I thought it may fit here more.

ADD REPLYlink written 13 months ago by Medhat8.6k

Yeah I agree that it would make sense to use biostars, but see also the support section in the snakemake documentation.

ADD REPLYlink written 13 months ago by WouterDeCoster42k

Aha, now I understand, thanks that makes more sense for me now.

ADD REPLYlink written 13 months ago by Medhat8.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1336 users visited in the last hour