Need help changing RNA-seq code from histat aligner into Star aligner please.
0
0
Entering edit mode
2.7 years ago

I need help converting histat aligner into star aligner for this script. Please let me know what I could do. Link to the full code is here: https://github.com/JustinGibbons/USF_Omics_Hub/blob/master/RNA-Seq_Pipeline/rna_seq_pipeline.py

def write_out_hisat_array_slurm_file(arguments_file,outfile="hisat_array_commands.sh"):
"""Uses data from input file to write out the hisat commands as an array job"""

#hisat_array_name="hisat_array"
#sam_array="sam_array" I don't think I need this for right now
samtools_sort_array_name="samtools_sort_array"
#bam_array="bam_array" I don't think I need this for anything right now
workdir=arguments_dic["main_output_dir"]
print workdir
#Create the directory for the alignment files
alignment_dir="Alignments"
os.mkdir(workdir+"/"+alignment_dir)
#Dictionary pointing from sample names to reads
if "Data_Paired_End" not in arguments_dic:
data_paired_end=True
else:
data_paired_end=arguments_dic["Data_Paired_End"]
if data_paired_end=="TRUE" or data_paired_end=="true" or data_paired_end=="True" or data_paired_end=="T":
data_paired_end=True
else:
data_paired_end=False

#List of slurm options to be written on their own lines
time=":".join([arguments_dic["hours"],arguments_dic["minutes"],"00"]),
mem=arguments_dic["mem"],
mail_user=arguments_dic["email_to_send_output_to"],submit_to_rra=arguments_dic["submit_to_rra"])

time=":".join([arguments_dic["hours"],arguments_dic["minutes"],"00"]),
mem=arguments_dic["mem"],
mail_user=arguments_dic["email_to_send_output_to"],submit_to_rra=arguments_dic["submit_to_rra"])

hisat_command_input,sam_files=construct_hisat_array_command(index_files_path_and_basename=arguments_dic["genome_index"],
input_dict=arguments_dic,alignment_dir=alignment_dir,data_paired_end=data_paired_end)

if data_paired_end==True:
elif data_paired_end==False:
#print hisat_command_input
arguments_dic["alignment_files"]=bam_files
gff_file=arguments_dic["gff"]
if "gff_annotation" in arguments_dic:
id_value=arguments_dic["gff_annotation"]
else:
id_value="gene_id"
featurecounts_command_list=construct_feature_counts_command(gff_file,bam_files,id_value=id_value)
featurecounts_command=" ".join(featurecounts_command_list)
normalization_groups=arguments_dic["normalization_groups"]
normalization_method=arguments_dic["normalization_method"]
cuffnorm_command_list=construct_cuffnorm_command(gff_file,bam_files,alignment_dir+"/",groupings=normalization_groups,normalization_method=normalization_method)
cuffnorm_command=" ".join(cuffnorm_command_list)
narray=len(sam_files)
##May want to include a warning here if only one job is allowed to run at a time
narray_arg="#SBATCH --array=0-"+str(narray-1)+"%"+str(narray)
slurm_commands.append(narray_arg)
slurm_commands.append("#SBATCH --output=rna-seq_pipeline.out")
slurm_commands.append("module purge")

slurm_commands2.append("#SBATCH --output=rna-seq_pipeline2.out")

RNA-Seq • 988 views
2
Entering edit mode

This is a huge mess of a python script that writes a script to be submitted to slurm based on input parameters. You're better off using the sample script file in the github repo and replacing hisat2 with STAR in that. Changing this script is going to take a TON of time as it is insanely specific to the hisat2 pipeline and the environment it's running. It's spaghetti code at its best (worst?).

2
Entering edit mode

most of this script has nothing to do with HISAT2 or STAR and is just the developer reinventing what pipeline frameworks do

2
Entering edit mode

Exactly. This is what obliterates any sense of portability in the scripts - they're tied too closely to the tools AND the environment. There's no plug and play here.

0
Entering edit mode

Thank you very much for the reply. Could you give me some insight into how to go about replacing the hisat2 with STAR in the sample script file? I'd greatly appreciate it. Thank you again.

0
Entering edit mode

Sorry, that is not a level of help I'm willing to offer. Please read the hisat2 and STAR manual and see what changes need to be made.

1
Entering edit mode

Just use nf-core RNAseq - it can do both

0
Entering edit mode

The only issue is for the lab we need to use the pipeline provided above.

1
Entering edit mode

Dare I ask why? If you are replacing the core program, it is not the same pipeline anymore.

0
Entering edit mode

I honestly don't know it's one of my colleagues issue and I'm just trying to help her figure it out.

1
Entering edit mode

I would try to figure out why she is doing this in the first place. If change one step, downstream steps may break. You could be troubleshooting for a while and it may be irrelevant in the end.

1
Entering edit mode

If that's the case, your colleague should be the one making the post here. Any effort put into making this script work with STAR is wasted, as you'll spend 10x the time understanding this script as you will into switching to STAR.

1
Entering edit mode

then you'll need to look at the construct_hisat_array_command and write a corresponding construct_star_array_command and obviously change the module command