Question: SolexaQA++ lengthsort - c output
0
gravatar for gtho123
2.0 years ago by
gtho123210
New Zealand
gtho123210 wrote:

I am looking for advice on the output of SolexaQA++ lengthsort - c when preprocessing my RNA-Seq data.

Having already used SolexaQA++ dynamictrim to trim by read quality I then sought to remove any short reads which resulted using SolexaQA++ lengthsort and as the data is paired end I used the -c flag.

Here are the relevant lines from my bash script:

path1="/PATH/TO/trim/Sample1_R1"
path2="/PATH/TO/trim/Sample1_R2"
SolexaQA++ lengthsort -c -l 36 -d "/PATH/TO/sort" $path1$".fastq.trimmed.gz", $path2$".fastq.trimmed.gz"

I expected there to be six resulting files; paired-end, singleton and discard for each input file (R1 and R2). However what was produced was just two Sample1_R2.trimmed.gz.clean and Sample1_R2.trimmed.gz.paired. What happened to R1?

Has something gone wrong? if so how? and if not what do these files contain?

EDIT:

If it helps the input files are trimmed FASTQ files. Here is the top 8 lines of Sample1_R1.fastq.trimmed when unzipped.

@HWI-7001326F:29:C732HANXX:8:1101:1258:1926 1:N:0:ATCACGAT
TCATGAGAAAAGGAACTCCGTCTCATCTGGCATTGCCAATAAAC
+
FFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@HWI-7001326F:29:C732HANXX:8:1101:1457:1985 1:N:0:ATCACGAT
CAACAACTTTGAAGGGTCTTGAAAGGGCAGGTAGTCCTCTAACTGAAGATTTCTCAACTCTAAAAGGAGTTGGTTTCAAACTCACAGAAGCCATAACTGAAGAGATCGGAAGAGCACACG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB

The output files both appear to be empty.

This was the end of the terminal output:

...    
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20703:101335
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20624:101360
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20904:101266
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20879:101307
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20776:101348
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20940:101369
Paired reads were written to:
/PATH/TO/sort/.clean
/PATH/TO/sort/C732HANXX-1721-01-01-01_L008_R2.fastq.trimmed.gz.clean

100% [==================================================]
Writing files...

Why has this happened?

sequencing next-gen • 891 views
ADD COMMENTlink modified 4 months ago by Biostar ♦♦ 20 • written 2.0 years ago by gtho123210

Perhaps you could tell us what they contain? Particularly, the first 8 lines of each file would be helpful, as would the number of input reads and the number of reads in each output file... and of course anything the program printed to the screen.

ADD REPLYlink written 2.0 years ago by Brian Bushnell16k
0
gravatar for Brian Bushnell
2.0 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Well, I'm not really sure what SolexaQA++ is doing or why it's producing blank output files. But I would suggest that you try BBDuk for quality-trimming and removing short reads, like this (adjusting parameters as desired):

bbduk.sh in1=read1.fq.gz in2=read2.fq.gz out1=trimmed1.fq.gz out2=trimmed2.fq.gz qtrim=r trimq=10 minlen=36

You should adapter-trim prior to quality-trimming, though.

ADD COMMENTlink written 2.0 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1147 users visited in the last hour