Hi there,
you can do two-pass-mapping manually by splitting up the process. I read about it in some google group with developer Alex:
So this is an example for single end data with an average length of 150nt.
First you do the first STAR alignment:
#Enter the right directory
cd /path_to_project/gap_table/
#For every Sample file
while read SAMPLE; do
#get sole file name
FILEBASE=$(basename "${SAMPLE%.fastq}")
#make new directory for the sample
mkdir /path_to_project/gap_table/$FILEBASE.STAR
#Enter the new directory
cd /path_to_project/gap_table/$FILEBASE.STAR
#align with STAR aligner
/path_to/STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd --runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --genomeDir /path_to_STAR_index_directory/star_index_hg38_r149/ --sjdbOverhang 149 --sjdbGTFfile /path_to_genome_GTF_file/Homo_sapiens.GRCh38.91.gtf --outFileNamePrefix /path_to_project/gap_table/$FILEBASE.STAR/ --readFilesIn $SAMPLE
done < Textfile_with_sample_names.txt
After a STAR alignment, there is a file generated called SJ.out.tab
. This file hold information about every gap which passed the criteria in the first STAR alignment to be counted as such, with at least one read overlapping it. During 2-pass mapping, you want to mapp the reads of all samples against the all gap which were found in the first alignment of all samples together. Therefore, in the next step, you gather all SJ.out.tab
files and fuse them to an unique one. This could be done for example like this:
Generating fused unique SJ.out.tab
#Collect all SJ_out.tab files in new directory SJ_tabs
cd /path_to_project/gap_table/
mkdir SJ_tabs
#For every fastq file in your directory
for i in /path_to_project/gap_table/*.STAR
#Do the following
do
#Enter the new directory
cd $i
#Get the base name of the current sample
b=$(basename "${i%.STAR}")
#Access files and copy them to new directory with a new sample specific name
cp SJ.out.tab /path_to_project/gap_table/SJ_tabs/$b.sj_out1.tab
done
#Getting only unique entries
cd /path_to_project/gap_table/SJ_tabs/
cat *.tab > all_SJ_tabs.tab
awk '!x[$0]++' all_SJ_tabs.tab > Unique_SJ_all.tab
#Transforming SJ.tab to gtf file with all found junctions
cd /path_to_project/gap_table/SJ_tabs/
awk 'BEGIN {OFS="\t"; strChar[0]="."; strChar[1]="+"; strChar[2]="-";} {if($5>0){print $1,$2,$3,strChar[$4]}}' /path_to_project/gap_table/SJ_tabs/Unique_SJ_all.tab > SJin.tab
awk '!x[$0]++' SJin.tab > Unique_SJin.tabs
#Clean up unwanted files
cd /path_to_project/gap_table/SJ_tabs/
rm *.tab
cp Unique_SJin.tabs Unique_SJin.tab
rm Unique_SJin.tabs
We now generate a new STAR index for the second STAR run, which holds all found gaps from the previous STAR run:
Generate new STAR index
#Go to rigth directory
cd /path_to_project/gap_table/
#make new directory with new genome for second STAR pass
mkdir Genome2
#Create Genome2
/path_to/STAR --runMode genomeGenerate --genomeDir Genome2 --genomeFastaFiles /path_to_genome_fasta_file/GRCh38_r91.fa --runThreadN 8 --sjdbGTFfile /path_to_genome_GTF_file/Homo_sapiens.GRCh38.91.gtf --sjdbOverhang 149 --sjdbFileChrStartEnd /path_to_project/gap_table/SJ_tabs/Unique_SJin.tab --limitSjdbInsertNsj 2000000
Using the newly generated STAR index, we no redo STAR alignment a second time:
Second STAR aligment:
#Go to rigth directory
cd /path_to_project/gap_table/
while read SAMPLE; do
#Get base file name
FILEBASE=$(basename "${SAMPLE%.trim.fq}")
#Make new directory for the sample
mkdir /path_to_project/gap_table/$FILEBASE.STAR_second_pass
#Enter the new directory
cd /path_to_project/gap_table/$FILEBASE.STAR_second_pass
#align with STAR aligner
/path_to_STAR/STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd --runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --genomeDir /path_to_project/gap_table/Genome2/ --sjdbOverhang 149 --quantMode GeneCounts --sjdbGTFfile /path_to_genome_GTF_file/Homo_sapiens.GRCh38.91.gtf --limitSjdbInsertNsj 2000000 --readFilesIn $SAMPLE
done < Text_file_with_sample_names.txt
And that's it :)
I assume you are referring to a physical node on the cluster that has those specs (hopefully you have many more like this node). It does not automatically mean that all those resources are available for your job, especially if you are using a job scheduler. Are you using one? If so which one and what parameters are you using for that job submission?
Hi, i am using the interactive mode into the cluster and not a bash script (I do not know if that is the answer to your question). For initiate the interactive mode I request the resources as this way:
srun --pty -p interactive --mem 250G -c 20 -t 0-12:00:00 /bin/bash
Let me ask you this. Do you see any other user processes running on that node (if you run
top
orhtop
)? Since you are not using--exclusive[=user]
option my assumption is that there may be some other things running on that node.BTW; that
srun
seems to indicate that your cluster usesSLURM
job scheduler but you are not submitting a non-interactive job.Exactly, my cluster uses
SLURM
, when I rantop
andhtop
seem that I am the only one using that, i gonna try to open a new session with--exclusive
and see what I getTry to lower the number of threads to 16, or even lower. STAR creates a lot of temporary files for each thread, and you may be hitting the limit of
ulimit -n
."no success" means trying to increase the limits resulted in a error message? Or no error message, but limits were left unchanged? Or limits were changed, but STAR crashed anyway? You can only increase the limits to what is defined in /etc/security/limits.conf, so maybe you can't increase the limits anyway - you may need to contact the system administrators.
Hi!Thanks for your reply. By "no successs" I meant that the limits were changed and STAR crashed anyway. I will try to lower the number of threads
Hi guys, Just a update: I tried both suggestions: I try to run the
SLURM
with the--exclusive
tag, and I also try to run with a reduced number of cores. I still have the same Log.out error, and then and I tried to useulimit -v
andulimit -n
the STAR crash yet. If someone has any insight about this trouble I will be happy to read that. Anyway, thank you guys for helping me.Can you post the actual
STAR
command you are using? What genome are you trying to align against (size)?