Prinseq lite data preprocessing
2
0
Entering edit mode
17 months ago

Hello everyone..

I am learning RNA seq analysis. Firstly, I am using Prinseq lite for preprocessing of data.

I used command:

perl prinseq-lite.pl –fastq read_1.fastq -fastq2 read_2.fastq -out_format 5 -min_len 50 -min_qual_mean 25


I got three output files in same folder for each data file. These are _prinseq_good_singletons_, _prinseq_good_, _prinseq_bad_.

Further, the size of _prinseq_good_ is greater than input data file. Is it OK?

Please suggest me that which file could I use for downstream analysis?

Data rna-seq preprocessing Forum Prinseq lite • 951 views
0
Entering edit mode

File sizes are not a good measure of anything by themselves. Does prinseq print a log file or a stats file of some sort? That would be useful in understanding what happens in the run. Also, read the manual - that should describe each output file.

0
Entering edit mode
17 months ago
GenoMax 115k

I am not a prinseq user but based on the names _prinseq_good_singletons_, _prinseq_good_ would be the files you would want. Good are reads where both reads (from R1/R2) survived the trimming. You will want to be cautious about using the singleton file. Most aligners will not allow you to mix paired and singleton reads in the same alignment.

File sizes are never a good metric for anything (unless you are just making sure file produced is not empty). Since your files don't appear to be compressed hopefully the size difference is negligible. Generally compressibility of data results in file size changes as data is lost via trimming/filtering for example.

0
Entering edit mode

0
Entering edit mode
13 months ago
nzulapa • 0

_prinseq_good_singletons_: contain the reads which lost their pairs

_prinseq_good_: contain the remained pairs after removing duplicate, low complexity,...