Question: Create snakefile based on R script
0
gravatar for r.tor
5 months ago by
r.tor40
r.tor40 wrote:

I have extracted the Transcription Start Sites (TSS) from hsapiens gene ensembl database and TSS flnaking regions through biomaRt and GenomicRanges. Then all targeted data has been stored in a bed format file. The code looks like this:

library(biomaRt)
library(regioneR)
library(BSgenome.Hsapiens.UCSC.hg38)

mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
att <- listAttributes(mart)

transcripts <- getBM(attributes=c("chromosome_name", "transcript_start", "transcript_end","strand",        "ensembl_gene_id","gene_biotype", "ensembl_transcript_id"),
  filters ="biotype",
  values  =c("protein_coding"),
  mart    =mart)

transcripts[transcripts$strand == 1,]$strand <- "+"
transcripts[transcripts$strand == -1,]$strand <- "-"

trans.range <- makeGRangesFromDataFrame(transcripts, keep.extra.columns = TRUE, ignore.strand =  FALSE, start.field = "transcript_start", end.field = "transcript_end", strand.field = "strand")

 #flank regions
 flank.regions <- promoters(trans.range, upstream=1500, downstream=500, use.names=TRUE)

 #to just keeping the interval ranges (IR) of standard chromosomes:
 stand.flank <-  flank.regions[seqnames(flank.regions) ==     c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22"),]

#change the levels to the only autosomal chr!
seq(1:22)
seqlevels(stand.flank) <- as.character(seq(1:22))

#split granges based on metadata elements (ensembl_gene_id)

gen.splitted <- split(stand.flank, stand.flank$ensembl_gene_id)

#reduce gen.splitted to get reduced intervals regions for each ensembl_gene_id in one list seperately
gen.reduced <- reduce(gen.splitted)

#to create bed.format file;
gen.bed <- unlist(gen.reduced)

#sort GRanges based on seqnames
gen.bed <- sortSeqlevels(gen.bed)
gen.bed <- sort(gen.bed)

#i want the "ensembl_gene_id" values to be stored in the "names" column of the BED file
names=elementMetadata(gen.bed)$ensembl_gene_id

 #To write data in a Granges object to a bed File.
 gen.bedfile <- data.frame(seqnames=seqnames(gen.bed),
 starts=start(gen.bed)-1,
 ends=end(gen.bed),
 names=names(gen.bed),
 scores=c(rep(".", length(gen.bed))), 

 #To put a "." instead of score value in each score column!
 strands=strand(gen.bed))
 write.table(df, file="foo.bed", quote=F, sep="\t", row.names=F, col.names=F)

#split bed file based on seqnames(chromosome number)
gen.bed.splitted <- split(gen.bedfile, gen.bedfile$seqnames)

#Remove empty zero length rows in a list
gen.bedfile <- gen.bed.splitted[sapply(gen.bed.splitted, nrow)>0]

Now, I'd like to create a snakefile following snakemake mannual to create a small pipeline. Since, I am new in snakemake, I'd appreciate if someone could guide me how would be the input of the first rule where I think it needs to call the hsapiens gene ensembl database. And what about the shell section! because I heard it is not a good idea to transform R script into Snakemake directly, and the Python code would be a better choice. Thanks a lot.

snakefile snakemake python R • 578 views
ADD COMMENTlink modified 5 months ago by WouterDeCoster39k • written 5 months ago by r.tor40

your script didn't take any input paramater (correct ?). So I don't understand the point of doing a snakemake file of that ? Is this R script one step in your pipeline ? FYI a simple call Rscript your_script.R is ok to call the script

ADD REPLYlink written 5 months ago by Nicolas Rosewick7.7k

Thank you for the reply. I want to make a workflow as Snakefile looking at my R scripts and yes it would be a begining part of workflow. I have just extracted the target regions considering the applied attributes and filters by getBM function in Biomart package. The source of all, is "hsapiens gene ensembl database". I actually don't know what could be supposed as input in the Snakefile's first rule in this case!! Also, I have tried to put my R scripts directly into the "shell" section of the rule using:

rule tss_ranges:
output:
    "transcripts.txt"
run:
robjects.r(("""

    mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
    att <- listAttributes(mart)

    {output} = getBM(attributes=c("chromosome_name", "transcript_start",
    "transcript_end","strand", "ensembl_gene_id","gene_biotype",
    "ensembl_transcript_id"),
    filters ="biotype",
    values  =c("protein_coding"),
    mart    =mart)

    """))

but it does not work. The corresponding error is about "Unexpected keyword Rscript in rule definition" and even "SyntaxError".

ADD REPLYlink modified 5 months ago • written 5 months ago by r.tor40
2

What about something like

rule tss_ranges:
    output:
        "transcripts.txt"
    shell:
        "Rscript your_script.R"
ADD REPLYlink written 5 months ago by Jean-Karim Heriche19k
2
gravatar for WouterDeCoster
5 months ago by
Belgium
WouterDeCoster39k wrote:

The easiest would probably be to just save your code as an Rscript (mycode.R), and in the shell part run Rscript mycode.R, optionally with arguments.

ADD COMMENTlink written 5 months ago by WouterDeCoster39k

Hi, you mean is to put all R scripts in one rule? and what about the input section of the rule?

ADD REPLYlink modified 5 months ago • written 5 months ago by r.tor40
1

I think you serisouly need to read & understand snakemake tutorial

ADD REPLYlink modified 5 months ago • written 5 months ago by H.Hasani760
1

From https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html:

the syntax follows that of S4 classes with attributes that are R lists, e.g. we can access the first input file with snakemake@input[[1]] (note that the first file does not have index 0 here, because R starts counting from 1). Named input and output files can be accessed in the same way, by just providing the name instead of an index, e.g. snakemake@input[["myfile"]].

Input would be the file you would supply to R for loading and output is expected output.

ADD REPLYlink modified 5 months ago • written 5 months ago by cpad011211k

Which input does that code take?

ADD REPLYlink written 5 months ago by WouterDeCoster39k

just downloading the ordered regions from database!

ADD REPLYlink written 5 months ago by r.tor40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1077 users visited in the last hour