Question

RNA-seq pipeline

0

Entering edit mode

3 months ago

Javier • 0

Hello, I am learning bioinformatics and my professor assigned me the task of creating a pipeline for RNA-seq in Snakemake using the following software: fastqc, cutadapt. fastqc, STAR and rna-seqc.

My problem is that when I try to run my code the following warning appears:

MissingOutputException on line 39 of /home/vdi/Trancryptomica/pipeline/snakefile:
Working Files missing after 5 seconds:

for each output of the first rule, and when I try to fix it I make a new error:

Incomplete Files Exception:
The following files appear to be incomplete. If you are sure that certain files are not incomplete

again, for the outputs of the first rule. Here is my code:

work_dir="/home/vdi/Transcriptomica/pipeline"
GENOME=work_dir + "/data/references/chr10.fa"
TRANSCRIPTOME=work_dir + "/data/references/gencode.v38.annotation.chr10.gtf"

samples=["test-sample1", "test-sample2"]
ids=["_1", "_2"]

rule all:
    input:
        #fastqc_raw output:
        expand(work_dir + "/results/QC/reads/{sample}{id}_fastqc.html", sample=samples, id=ids),

        #cutadapt output:
        expand(work_dir + "/data/reads/{sample}_1_trimm.fastq", sample=samples),
        expand(work_dir + "/data/reads/{sample}_2_trimm.fastq", sample=samples),

        #fastqc_trimm output:
        expand(work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.html", sample=samples, id=ids),

        #star index output:
        directory(work_dir + "/data/references/index_star/"),

        #star alig output:
        expand(work_dir + "/results/alignment/{sample}Unmapped.out.mate1", sample=samples),
        expand(work_dir + "/results/alignment/{sample}Unmapped.out.mate2", sample=samples),
        expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples),
        expand(work_dir + "/results/alignment/{sample}Log.final.out", sample=samples),
        expand(work_dir + "/results/cuantification/{sample}read_counts.out", sample=samples),
        expand(work_dir + "/results/cuantification/{sample}read_counts.txt", sample=samples),

        #rna-seqc output:
        directory(work_dir + "/results/QC/alignment/")


ruleorder: fastqc_raw > fastqc_trimm

rule fastqc_raw:
    input:
        work_dir + "/data/reads/{sample}{id}.fastq"
    output:
        zip=work_dir + "/results/QC/reads/{sample}{id}_fastqc.zip",
        html=work_dir + "/results/QC/reads/{sample}{id}_fastqc.html"
    params:
        path="results/QC/reads/"
    shell:
        """
        fastqc {input} -o {params}
        """

rule cutadapt:
    input:
        work_dir + "/data/reads/{sample}_1.fastq",
        work_dir + "/data/reads/{sample}_2.fastq"
    output:
        work_dir + "/data/reads/{sample}_1_trimm.fastq",
        work_dir + "/data/reads/{sample}_2_trimm.fastq"
    shell:
        """
        cutadapt -a CGCCTTGGCCGT  -A CGCCTTGGCCGT  -o {output[0]} -p {output[1]} {input[0]} {input[1]}
        """

rule fastqc_trimm:
    input:
        work_dir + "/data/reads/{sample}{id}_trimm.fastq"
    output:
        zip=work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.zip",
        html=work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.html"
    params:
        path="results/QC/reads/"
    shell:
        """
        fastqc {input} -o {params}
        """

rule STAR_index:
    input:
        GENOME,
        TRANSCRIPTOME
    output:
        directory(work_dir + "/data/references/index_star")
    shell:
        """
        mkdir -p  data/references/index_star
        STAR --runThreadN 3 --runMode genomeGenerate --genomeDir {output} --genomeFastaFiles {input[0]} --sjdbGTFfile {input[1]} -- sjdbOverhang 100 
        """

rule STAR_alig:
    input:
        ref=work_dir + "/data/references/index_star/",
        fw=work_dir + "/data/reads/{sample}_1_trimm.fastq",
        rv=work_dir + "/data/reads/{sample}_2_trimm.fastq"
    output:
        work_dir + "/results/alignment/{sample}Unmapped.out.mate1",
        work_dir + "/results/alignment/{sample}Unmapped.out.mate2",
        work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam",
        work_dir + "/results/alignment/{sample}Log.final.out"
    params:
        path=work_dir + "/results/alignment/"
    shell:
        """
        mkdir {params}
        STAR --genomeDir {input.ref} --quantMode TranscriptomeSAM --readFilesIn {input.fw} {input.rv} --outFileNamePrefix {params} --readFilesCommand zcat --outFilterIntronMotifs RemoveNoncanonical --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx 
        """ 

rule FeatureCounts:
    input:
        TRANSCRIPTOME,
        BAMS=expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples)

    output:
        TPG=work_dir + "/results/cuantification/{sample}read_counts.out",
        PG=work_dir + "/results/cuantification/{sample}read_counts.txt"
    shell:
        """
        featureCounts -Q 20 -p -O -T 5 -a {input[0]} -o {output.TPG} {input[1]}
        cut -f1,7- {output.TPG} | sed 1d > {output.PG}
        """

rule RNASEQC:
    input:
        TRANSCRIPTOME,
        BAMS=expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples)
    output:
        directory(work_dir + "/results/QC/alignment/")
    shell:
        """
        rnaseqc {input[0]} {input[1]} {output}                                                                                                                                      
                """

I would like to know how to exit this bucle were i go from a error to other error

snakemake Ubuntu • 708 views

ADD COMMENT • link 3 months ago by Javier • 0

1

Entering edit mode

My guess would be that your work_dir is not your actual working directory. Did you launched your workflow from /home/vdi/Transcriptomica/pipeline ?

ADD REPLY • link 3 months ago by raphael.B ▴ 520

0

Entering edit mode

yes theres were my snakefile is

ADD REPLY • link 3 months ago by Javier • 0

0

Entering edit mode

So, it wasn't exactly the problem, but your repetition kept me thinking and I changed the directions in the snakefile "adding / to the end of work_dir and removing one at the beginning of the others" and it works. thank you

ADD REPLY • link 3 months ago by Javier • 0

0

Entering edit mode

How are you executing the pipeline? Please show us all your commands.

ADD REPLY • link 3 months ago by Ram 43k

0

Entering edit mode

snakemake --cores all

Thats the only comand, initially I used :

snakemake --cores all -np

to make dry runs and fix the errors, once everything was solved I moved on to the real run

ADD REPLY • link 3 months ago by Javier • 0