Question: SPARTA (RNA-seq) workflow issue
0
gravatar for rchapari
10 months ago by
rchapari0
rchapari0 wrote:

Hello,

I am a bioinformatics novice and am trying to learn how to analyze my RNA-seq reads. I came across this program called SPARTA (Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis) which is designed for novices like myself. The program is Python based and functions via command prompt in Windows.

The first process in SPARTA's workflow appears to be quality control using Trimmomatic - and this is where my issues arise. When I run SPARTA, my sequencing read files are processed appropriately and it looks like Trimmomatic runs successfully. Then I am left with the following error message:

C:\Users\Ryan\Desktop\RNAseq_Data\2019-08-28_21\QC\trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz ILLUMINACLIP:\Users\Ryan\Desktop\SPARTA_Windows-master\QC_analysis\Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Reads: 2574357 Surviving: 2553622 (99.19%) Dropped: 20735 (0.81%)
TrimmomaticSE: Completed successfully
Traceback (most recent call last):
   File "SPARTA.py", line 155, in <module>
      qc.trimmomatic(rawdatapath, subfolderpath, options)
   File "C:\Users\Ryan\Desktop\SPARTA_Windows-master\qc_analysis.py", line 103, in trimmomatic
      extension = file.split(".")[1]
IndexError: list index out of range

I'm currently very confused what the 'IndexError' is referencing and I'm unsure how to resolve this issue.

Any feedback would be greatly appreciated. Thank you!

-Ryan

python rna-seq sparta • 318 views
ADD COMMENTlink modified 10 months ago by swbarnes27.8k • written 10 months ago by rchapari0

I never used it, and also I am on linux environment. But still from erroneous line I suppose this program takes input only fastq or fq files, where it splits the filename with . character and try to identify the extension. In your case if it will split trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz using . character. It will get [trimmedGSF1906-Delta-hns-3_S66_R1_001] [fastq] [gz]. Therefore in last index it will get gz format which is neither fastq nor fq. So I will suggest you that first you extract your fastq files using gunzip or any other extractor available in windows. And then run this pipeline on those fastq files. Good luck.

ADD REPLYlink modified 10 months ago by RamRS27k • written 10 months ago by prince2612199170
1

It's file.split(".")[1], not file.split(").[-1], so it's looking at the second part, not the last part. Splitting by . is not a great way though, so your point might still be applicable.

ADD REPLYlink written 10 months ago by RamRS27k

Yes you are absolutly right, I didn't see it carefully. But still I am wondering why this is giving list index out of range error. Because if I run this line in normal console it is running normally.

file="C:\Users\Ryan\Desktop\RNAseq_Data\2019-08-28_21\QC\trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz"
>>> extension = file.split(".")[1]
>>> extension
'fastq'
ADD REPLYlink written 10 months ago by prince2612199170

Probably because the file parameter is different in their script.

ADD REPLYlink written 10 months ago by RamRS27k

I'm looking through their scripts for the file parameter but have not found it yet

ADD REPLYlink written 10 months ago by rchapari0
1

What is the exact command line you're using for sparta.py? I'd also recommend using the --verbose option if you're not using it already.

ADD REPLYlink modified 10 months ago • written 10 months ago by RamRS27k

To execute sparta.py? Following the tutorial, I execute the program using python SPARTA.py

ADD REPLYlink written 10 months ago by rchapari0

And no options? What OS are you on?

Can you please give us all the details so your question is reproducible?

ADD REPLYlink written 10 months ago by RamRS27k

Also, I added verbose but it did not add any additional text to the error messages

ADD REPLYlink written 10 months ago by rchapari0

Can you show the screenshots from starting to this error?

ADD REPLYlink written 10 months ago by prince2612199170

That fastq name is bog standard Illumina naming. Gzipping fastq files is bog standard, it would have to be extremely stupid software to be designed to handle fastqs and not be able to handle gzipped ones.

ADD REPLYlink written 10 months ago by swbarnes27.8k

Thank you for the feedback! I have tried both ways, zipped and unzipped, with no avail :(

ADD REPLYlink written 10 months ago by rchapari0
0
gravatar for swbarnes2
10 months ago by
swbarnes27.8k
United States
swbarnes27.8k wrote:

It looks like Sparta is trying to be clever and assumes that every file it sees in the target directory is a fastq. Is this true? Because to me it looks like it's successfully trimming the first file it sees, but the second file is not a fastq, and that's halting the program.

it looks like SPARTA is also adding "trimmed" to the names of the trimmomatic output, but you are giving it as input?

ADD COMMENTlink modified 10 months ago • written 10 months ago by swbarnes27.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1591 users visited in the last hour