Question

SPARTA (RNA-seq) workflow issue

0

Entering edit mode

4.6 years ago

rchapari • 0

Hello,

I am a bioinformatics novice and am trying to learn how to analyze my RNA-seq reads. I came across this program called SPARTA (Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis) which is designed for novices like myself. The program is Python based and functions via command prompt in Windows.

The first process in SPARTA's workflow appears to be quality control using Trimmomatic - and this is where my issues arise. When I run SPARTA, my sequencing read files are processed appropriately and it looks like Trimmomatic runs successfully. Then I am left with the following error message:

C:\Users\Ryan\Desktop\RNAseq_Data\2019-08-28_21\QC\trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz ILLUMINACLIP:\Users\Ryan\Desktop\SPARTA_Windows-master\QC_analysis\Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Reads: 2574357 Surviving: 2553622 (99.19%) Dropped: 20735 (0.81%)
TrimmomaticSE: Completed successfully
Traceback (most recent call last):
   File "SPARTA.py", line 155, in <module>
      qc.trimmomatic(rawdatapath, subfolderpath, options)
   File "C:\Users\Ryan\Desktop\SPARTA_Windows-master\qc_analysis.py", line 103, in trimmomatic
      extension = file.split(".")[1]
IndexError: list index out of range

I'm currently very confused what the 'IndexError' is referencing and I'm unsure how to resolve this issue.

Any feedback would be greatly appreciated. Thank you!

-Ryan

RNA-Seq SPARTA Python • 1.4k views

ADD COMMENT • link updated 4.6 years ago by swbarnes2 14k • written 4.6 years ago by rchapari • 0

0

Entering edit mode

I never used it, and also I am on linux environment. But still from erroneous line I suppose this program takes input only fastq or fq files, where it splits the filename with . character and try to identify the extension. In your case if it will split trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz using . character. It will get [trimmedGSF1906-Delta-hns-3_S66_R1_001] [fastq] [gz]. Therefore in last index it will get gz format which is neither fastq nor fq. So I will suggest you that first you extract your fastq files using gunzip or any other extractor available in windows. And then run this pipeline on those fastq files. Good luck.

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 4.6 years ago by prince26121991 ▴ 70

1

Entering edit mode

It's file.split(".")[1], not file.split(").[-1], so it's looking at the second part, not the last part. Splitting by . is not a great way though, so your point might still be applicable.

ADD REPLY • link 4.6 years ago by Ram 43k

0

Entering edit mode

Yes you are absolutly right, I didn't see it carefully. But still I am wondering why this is giving list index out of range error. Because if I run this line in normal console it is running normally.

file="C:\Users\Ryan\Desktop\RNAseq_Data\2019-08-28_21\QC\trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz"
>>> extension = file.split(".")[1]
>>> extension
'fastq'

ADD REPLY • link 4.6 years ago by prince26121991 ▴ 70

0

Entering edit mode

Probably because the file parameter is different in their script.

ADD REPLY • link 4.6 years ago by Ram 43k

0

Entering edit mode

I'm looking through their scripts for the file parameter but have not found it yet

ADD REPLY • link 4.6 years ago by rchapari • 0

1

Entering edit mode

What is the exact command line you're using for sparta.py? I'd also recommend using the --verbose option if you're not using it already.

ADD REPLY • link 4.6 years ago by Ram 43k

0

Entering edit mode

To execute sparta.py? Following the tutorial, I execute the program using python SPARTA.py

ADD REPLY • link 4.6 years ago by rchapari • 0

0

Entering edit mode

And no options? What OS are you on?

Can you please give us all the details so your question is reproducible?

ADD REPLY • link 4.6 years ago by Ram 43k

0

Entering edit mode

Also, I added verbose but it did not add any additional text to the error messages

ADD REPLY • link 4.6 years ago by rchapari • 0

0

Entering edit mode

Can you show the screenshots from starting to this error?

ADD REPLY • link 4.6 years ago by prince26121991 ▴ 70

0

Entering edit mode

That fastq name is bog standard Illumina naming. Gzipping fastq files is bog standard, it would have to be extremely stupid software to be designed to handle fastqs and not be able to handle gzipped ones.

ADD REPLY • link 4.6 years ago by swbarnes2 14k

0

Entering edit mode

Thank you for the feedback! I have tried both ways, zipped and unzipped, with no avail :(

ADD REPLY • link 4.6 years ago by rchapari • 0

score 0 · Answer 1 · 2019-08-28

It looks like Sparta is trying to be clever and assumes that every file it sees in the target directory is a fastq. Is this true? Because to me it looks like it's successfully trimming the first file it sees, but the second file is not a fastq, and that's halting the program.

it looks like SPARTA is also adding "trimmed" to the names of the trimmomatic output, but you are giving it as input?