RNA-seq pipeline
0
0
Entering edit mode
3 months ago
Javier • 0

Hello, I am learning bioinformatics and my professor assigned me the task of creating a pipeline for RNA-seq in Snakemake using the following software: fastqc, cutadapt. fastqc, STAR and rna-seqc.

My problem is that when I try to run my code the following warning appears:

MissingOutputException on line 39 of /home/vdi/Trancryptomica/pipeline/snakefile:
Working Files missing after 5 seconds:

for each output of the first rule, and when I try to fix it I make a new error:

Incomplete Files Exception:
The following files appear to be incomplete. If you are sure that certain files are not incomplete

again, for the outputs of the first rule. Here is my code:

work_dir="/home/vdi/Transcriptomica/pipeline"
GENOME=work_dir + "/data/references/chr10.fa"
TRANSCRIPTOME=work_dir + "/data/references/gencode.v38.annotation.chr10.gtf"

samples=["test-sample1", "test-sample2"]
ids=["_1", "_2"]

rule all:
    input:
        #fastqc_raw output:
        expand(work_dir + "/results/QC/reads/{sample}{id}_fastqc.html", sample=samples, id=ids),

        #cutadapt output:
        expand(work_dir + "/data/reads/{sample}_1_trimm.fastq", sample=samples),
        expand(work_dir + "/data/reads/{sample}_2_trimm.fastq", sample=samples),

        #fastqc_trimm output:
        expand(work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.html", sample=samples, id=ids),

        #star index output:
        directory(work_dir + "/data/references/index_star/"),

        #star alig output:
        expand(work_dir + "/results/alignment/{sample}Unmapped.out.mate1", sample=samples),
        expand(work_dir + "/results/alignment/{sample}Unmapped.out.mate2", sample=samples),
        expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples),
        expand(work_dir + "/results/alignment/{sample}Log.final.out", sample=samples),
        expand(work_dir + "/results/cuantification/{sample}read_counts.out", sample=samples),
        expand(work_dir + "/results/cuantification/{sample}read_counts.txt", sample=samples),

        #rna-seqc output:
        directory(work_dir + "/results/QC/alignment/")


ruleorder: fastqc_raw > fastqc_trimm

rule fastqc_raw:
    input:
        work_dir + "/data/reads/{sample}{id}.fastq"
    output:
        zip=work_dir + "/results/QC/reads/{sample}{id}_fastqc.zip",
        html=work_dir + "/results/QC/reads/{sample}{id}_fastqc.html"
    params:
        path="results/QC/reads/"
    shell:
        """
        fastqc {input} -o {params}
        """

rule cutadapt:
    input:
        work_dir + "/data/reads/{sample}_1.fastq",
        work_dir + "/data/reads/{sample}_2.fastq"
    output:
        work_dir + "/data/reads/{sample}_1_trimm.fastq",
        work_dir + "/data/reads/{sample}_2_trimm.fastq"
    shell:
        """
        cutadapt -a CGCCTTGGCCGT  -A CGCCTTGGCCGT  -o {output[0]} -p {output[1]} {input[0]} {input[1]}
        """

rule fastqc_trimm:
    input:
        work_dir + "/data/reads/{sample}{id}_trimm.fastq"
    output:
        zip=work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.zip",
        html=work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.html"
    params:
        path="results/QC/reads/"
    shell:
        """
        fastqc {input} -o {params}
        """

rule STAR_index:
    input:
        GENOME,
        TRANSCRIPTOME
    output:
        directory(work_dir + "/data/references/index_star")
    shell:
        """
        mkdir -p  data/references/index_star
        STAR --runThreadN 3 --runMode genomeGenerate --genomeDir {output} --genomeFastaFiles {input[0]} --sjdbGTFfile {input[1]} -- sjdbOverhang 100 
        """

rule STAR_alig:
    input:
        ref=work_dir + "/data/references/index_star/",
        fw=work_dir + "/data/reads/{sample}_1_trimm.fastq",
        rv=work_dir + "/data/reads/{sample}_2_trimm.fastq"
    output:
        work_dir + "/results/alignment/{sample}Unmapped.out.mate1",
        work_dir + "/results/alignment/{sample}Unmapped.out.mate2",
        work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam",
        work_dir + "/results/alignment/{sample}Log.final.out"
    params:
        path=work_dir + "/results/alignment/"
    shell:
        """
        mkdir {params}
        STAR --genomeDir {input.ref} --quantMode TranscriptomeSAM --readFilesIn {input.fw} {input.rv} --outFileNamePrefix {params} --readFilesCommand zcat --outFilterIntronMotifs RemoveNoncanonical --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx 
        """ 

rule FeatureCounts:
    input:
        TRANSCRIPTOME,
        BAMS=expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples)

    output:
        TPG=work_dir + "/results/cuantification/{sample}read_counts.out",
        PG=work_dir + "/results/cuantification/{sample}read_counts.txt"
    shell:
        """
        featureCounts -Q 20 -p -O -T 5 -a {input[0]} -o {output.TPG} {input[1]}
        cut -f1,7- {output.TPG} | sed 1d > {output.PG}
        """

rule RNASEQC:
    input:
        TRANSCRIPTOME,
        BAMS=expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples)
    output:
        directory(work_dir + "/results/QC/alignment/")
    shell:
        """
        rnaseqc {input[0]} {input[1]} {output}                                                                                                                                      
                """

I would like to know how to exit this bucle were i go from a error to other error

snakemake Ubuntu • 708 views
ADD COMMENT
1
Entering edit mode

My guess would be that your work_dir is not your actual working directory. Did you launched your workflow from /home/vdi/Transcriptomica/pipeline ?

ADD REPLY
0
Entering edit mode

yes theres were my snakefile is

ADD REPLY
0
Entering edit mode

So, it wasn't exactly the problem, but your repetition kept me thinking and I changed the directions in the snakefile "adding / to the end of work_dir and removing one at the beginning of the others" and it works. thank you

ADD REPLY
0
Entering edit mode

How are you executing the pipeline? Please show us all your commands.

ADD REPLY
0
Entering edit mode

snakemake --cores all

Thats the only comand, initially I used :

snakemake --cores all -np

to make dry runs and fix the errors, once everything was solved I moved on to the real run

ADD REPLY

Login before adding your answer.

Traffic: 2348 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6