"Hello, I am a student who recently started studying bioinformatics. Since my understanding is still limited, I would appreciate it if you could explain even if the difficulty of the question is low. I am currently working with RNA-seq data and I am facing batch effects that are not reduced even with the Combat method using different pipeline and workflow. Therefore, I would like to standardize the analysis using the workflow available on the GDC portal. The code is provided on the website https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/.
I already downloaded reference sequence files (GRCh.38.d1.vd1.fa.tar.gz) and annotation files (gencode.v36.annotation.gtf.gz) on the website (https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files).
### Step 1: Building the STAR index. apps/STAR \ --runMode genomeGenerate \ --genomeDir STAR_genomeGenerate \ --genomeFastaFiles GRCh.38.d1.vd1.fa \ --sjdbOverhang 100 \ --sjdbGTFfile gencode.v36.annotation.gtf \ --runThreadN 8
It makes STAR_genomeGenerate/ and GenomeDir.
###Step :2 Alignment 1st Pass. --genomeDir STAR_genomeGenerate \ --readFilesIn a_1.fastq.gz b_1.fastq.gz c_1.fastq.gz a_2.fastq.gz b_2.fastq.gz c_2.fastq.gz \ --runThreadN 8 \ --outFilterMultimapScoreRange 1 \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 10 \ --alignIntronMax 500000 \ --alignMatesGapMax 1000000 \ --sjdbScore 2 \ --alignSJDBoverhangMin 1 \ --genomeLoad NoSharedMemory \ --readFilesCommand zcat \ --outFilterMatchNminOverLread 0.33 \ --outFilterScoreMinOverLread 0.33 \ --sjdbOverhang 100 \ --outSAMstrandField intronMotif \ --outSAMtype None \ --outSAMmode None
However, when I tried to input multiple fastq.gz files in the same way as the above code (--readFilesIn), I encountered the following error (Segmentation fault (core dumped), so I had to input them one by one. It gives SJ.out.tab, Log.out, Log.progress.out, and Log.final.out. In next step, SJ.out.tab is used for input.
However, as you may know, when I repeat Step 2, a new SJ.out.tab file is generated, and the previous SJ.out.tab file disappears. Then, in the next step, Step 3, there is an intermediate index generation step, but I'm uncertain about how to incorporate the SJ.out.tab file.
I would greatly appreciate it if you could provide an explanation for the issue in question.