STARsolo
2
0
Entering edit mode
13 months ago
Nodilan ▴ 10

Good morning everybody, I need you expertise : In fact I'm trying to write a script using python object oriented programming to perform STARsolo to generate count matrix of differentially expressed genes (exons only that's why I used --quantMode GeneCounts ) but I have an erro that I can not solve ( the bash command line works )

this is the bash command line:

/opt/conda/envs/scRNA_seq_env/bin/STAR --outSAMattributes All \
     --outSAMtype BAM Unsorted \
     --quantMode GeneCounts \
     --readFilesCommand gunzip -c \
     --runThreadN 7 \
     --outReadsUnmapped Fastx \
     --outMultimapperOrder Random \
     --genomeDir /home/output/genome_index \
     --readFilesIn home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz \
     --outFileNamePrefix home/output/testou_testou_sh \
     --soloType CB_UMI_Simple \
     --soloCBwhitelist home/cellranger-7.1.0/lib/python/cellranger/barcodes/3M-february-2018.txt \
     --soloUMIlen 12 \
     --soloCBlen 16 \
     --soloUMIstart 17 \
     --soloCBstart 1 \
     --soloBarcodeReadLength 28 \
     --soloUMIfiltering MultiGeneUMI_CR \
     --soloUMIdedup 1MM_CR \
     --clipAdapterType CellRanger4 \
     --outFilterScoreMin 30 \
     --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
     --soloCellFilter EmptyDrops_CR

OOP version where I get the error:

import subprocess

class StarCommand:
    def __init__(self, genome_dir, output_prefix, read_files, threads=7):
        self.genome_dir = genome_dir
        self.output_prefix = output_prefix
        self.read_files = read_files
        self.threads = threads
        self.command = ["/opt/conda/envs/scRNA_seq_env/bin/STAR",
                        "--outSAMattributes", "All",
                        "--outSAMtype", "BAM", "Unsorted",
                        "--quantMode", "GeneCounts",
                        "--readFilesCommand", "gunzip -c",
                        "--runThreadN", str(self.threads),
                        "--outReadsUnmapped", "Fastx",
                        "--outMultimapperOrder", "Random",
                        "--genomeDir", self.genome_dir,
                        "--readFilesIn", self.read_files,
                        "--outFileNamePrefix", self.output_prefix,
                        "--soloType", "CB_UMI_Simple",
                        "--soloCBwhitelist", "home/cellranger-7.1.0/lib/python/cellranger/barcodes/3M-february-2018.txt",
                        "--soloUMIlen", "12",
                        "--soloCBlen", "16",
                        "--soloUMIstart", "17",
                        "--soloCBstart", "1",
                        "--soloBarcodeReadLength", "28",
                        "--soloUMIfiltering", "MultiGeneUMI_CR",
                        "--soloUMIdedup", "1MM_CR",
                        "--clipAdapterType", "CellRanger4",
                        "--outFilterScoreMin", "30",
                        "--soloCBmatchWLtype", "1MM_multi_Nbase_pseudocounts",
                        "--soloCellFilter", "EmptyDrops_CR"]

    def run_command(self):
        try:
            subprocess.run(self.command, check=True)
            print("STAR command finished successfully!")
        except subprocess.CalledProcessError as e:
            print(f"STAR command failed with exit code {e.returncode}:")
            print(e.output)

genome_dir = "/home/output/genome_index"
output_prefix = "/home/output/testou_testou_sh"
read_files = "/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz"

star_command = StarCommand(genome_dir, output_prefix, read_files)
star_command.run_command()

the error:

EXITING: because of fatal INPUT file error: could not open read file: /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz /home/pbmc_1k_v3_S1_L001_R1_001.fastq.gz
SOLUTION: check that this file exists and has read permision.

Mar 16 09:22:35 ...... FATAL ERROR, exiting
STAR command failed with exit code 102:
None

thank you in advance

STARsolo scRNAseq • 1.4k views
ADD COMMENT
0
Entering edit mode

my files are readable because the bash command line works perfectly :/

ADD REPLY
0
Entering edit mode

/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz /home/pbmc_1k_v3_S1_L001_R1_001.fastq.gz

Are those files in two separate directories? That is what is in the error message. On command line version they are in the same directory or so it looks.

In command line version you seem to be missing a leading / before home but that may be a copy paste error since you say that line works.

--readFilesIn home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz  etc.

and

--soloCBwhitelist home/cellranger-7.1.0/lib/python/cellranger etc
ADD REPLY
0
Entering edit mode

no they are in the same directory I just missed copying the script here but I do it right in my terminal and keep having the same error :/

ADD REPLY
0
Entering edit mode

This exatly how my codes looks like and the error generated : (bash script works )

this is the bash command line:

/opt/conda/envs/scRNA_seq_env/bin/STAR --outSAMattributes All \ --outSAMtype BAM Unsorted \ --quantMode GeneCounts \ --readFilesCommand gunzip -c \ --runThreadN 7 \ --outReadsUnmapped Fastx \ --outMultimapperOrder Random \ --genomeDir /home/output/genome_index \ --readFilesIn /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz \ --outFileNamePrefix home/output/testou_testou_sh \ --soloType CB_UMI_Simple \ --soloCBwhitelist /home/cellranger-7.1.0/lib/python/cellranger/barcodes/3M-february-2018.txt \ --soloUMIlen 12 \ --soloCBlen 16 \ --soloUMIstart 17 \ --soloCBstart 1 \ --soloBarcodeReadLength 28 \ --soloUMIfiltering MultiGeneUMI_CR \ --soloUMIdedup 1MM_CR \ --clipAdapterType CellRanger4 \ --outFilterScoreMin 30 \ --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \ --soloCellFilter EmptyDrops_CR

OOP version where I get the error:

import subprocess

class StarCommand: def __init__(self, genome_dir, output_prefix, read_files, threads=7): self.genome_dir = genome_dir self.output_prefix = output_prefix self.read_files = read_files self.threads = threads self.command = ["/opt/conda/envs/scRNA_seq_env/bin/STAR", "--outSAMattributes", "All", "--outSAMtype", "BAM", "Unsorted", "--quantMode", "GeneCounts", "--readFilesCommand", "gunzip -c", "--runThreadN", str(self.threads), "--outReadsUnmapped", "Fastx", "--outMultimapperOrder", "Random", "--genomeDir", self.genome_dir, "--readFilesIn", self.read_files, "--outFileNamePrefix", self.output_prefix, "--soloType", "CB_UMI_Simple", "--soloCBwhitelist", "/home/cellranger-7.1.0/lib/python/cellranger/barcodes/3M-february-2018.txt", "--soloUMIlen", "12", "--soloCBlen", "16", "--soloUMIstart", "17", "--soloCBstart", "1", "--soloBarcodeReadLength", "28", "--soloUMIfiltering", "MultiGeneUMI_CR", "--soloUMIdedup", "1MM_CR", "--clipAdapterType", "CellRanger4", "--outFilterScoreMin", "30", "--soloCBmatchWLtype", "1MM_multi_Nbase_pseudocounts", "--soloCellFilter", "EmptyDrops_CR"]

def run_command(self):
    try:
        subprocess.run(self.command, check=True)
        print("STAR command finished successfully!")
    except subprocess.CalledProcessError as e:
        print(f"STAR command failed with exit code {e.returncode}:")
        print(e.output)
genome_dir = "/home/output/genome_index" output_prefix = "/home/output/testou_testou_sh" read_files = "/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz"

star_command = StarCommand(genome_dir, output_prefix, read_files) star_command.run_command()

the error:

EXITING: because of fatal INPUT file error: could not open read file: /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz SOLUTION: check that this file exists and has read permision.

Mar 16 09:22:35 ...... FATAL ERROR, exiting STAR command failed with exit code 102: None

thank you in advance

ADD REPLY
1
Entering edit mode
13 months ago
dsull ★ 5.8k

Can you try putting quotes around it?

read_files = "\"/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz /home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz\""

If that doesn't work, can you try shell=True in your subprocess.run()?

ADD COMMENT
0
Entering edit mode

thank you for your help

ADD REPLY
0
Entering edit mode

Does this mean the problem was solved? If so I will move this comment to an answer so you can accept it.

ADD REPLY
0
Entering edit mode

what I did is that I created two lists one for R1 reads seperated by (,) and another for R2 seperated by (,) "--readFilesIn", ",".join(self.r2_files), ",".join(self.r1_files),

ADD REPLY
1
Entering edit mode
13 months ago
Nodilan ▴ 10

yes sure:

import re
import subprocess

class STAR:
    def __init__(self, r1_files, r2_files, output_prefix , genome_index , whitelist ):
        self.r1_files = self._filter_files(r1_files, r"_S\d+_L\d+_R1_\d+.fastq.gz")
        self.r2_files = self._filter_files(r2_files, r"_S\d+_L\d+_R2_\d+.fastq.gz")
        self.output_prefix = output_prefix
        self.genome_index = genome_index
        self.whitelist = whitelist
    def _filter_files(self, file_list, pattern):
        filtered_files = []
        for file in file_list:
            if re.search(pattern, file):
                filtered_files.append(file)
            else:
                raise ValueError(f"Invalid filename format ")
        return filtered_files

    def run(self):
        command = [
            "/opt/conda/envs/scRNA_seq_env/bin/STAR",
            "--outSAMattributes", "All",
            "--outSAMtype", "BAM", "Unsorted",
            "--quantMode", "GeneCounts",
            "--readFilesCommand", "gunzip -c",
            "--runThreadN", "8",
            "--outReadsUnmapped", "Fastx",
            "--outMultimapperOrder", "Random",
            "--genomeDir", self.genome_index,
            "--readFilesIn", ",".join(self.r2_files), ",".join(self.r1_files),
            "--outFileNamePrefix", self.output_prefix,
            "--soloType", "CB_UMI_Simple",
            "--soloCBwhitelist", self.whitelist,
            "--soloUMIlen", "12",
            "--soloCBlen", "16",
            "--soloUMIstart", "17",
            "--soloCBstart", "1",
            "--soloBarcodeReadLength", "28",
            "--soloUMIfiltering", "MultiGeneUMI_CR",
            "--soloUMIdedup", "1MM_CR",
            "--clipAdapterType", "CellRanger4",
            "--outFilterScoreMin", "30",
            "--soloCBmatchWLtype", "1MM_multi_Nbase_pseudocounts",
            "--soloCellFilter", "EmptyDrops_CR"
        ]

        subprocess.run(command)

r1_files = [
    "/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz",
    "/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz"
]

r2_files = [
    "/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz",
    "/home/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz"
]

output_prefix = "/home/output/STARsolo/"
genome_index = "/home/output/genome_index"
whitelist ="/home/cellranger-7.1.0/lib/python/cellranger/barcodes/3M-february-2018.txt"

star = STAR(r1_files, r2_files, output_prefix , genome_index,whitelist)
star.run()
ADD COMMENT

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6