Question: biopython script to preprocess raw RNA-seq reads (quality filtering, polyA and adapter trimming)
0
gravatar for shanasabri
2.5 years ago by
shanasabri40
shanasabri40 wrote:

Hello,

I have a raw, unaligned fastq.gz file that I am trying to preprocess using Biopython before alignment. I would ultimately like to remove low quality reads, trim polyA tails, trim adapters using fuzzy matching, and finally remove reads that do not satisfy a length requirement after all said preprocessing. It would also be neat to specify how many reads satisfy the filtering criteria at each step. I have been playing around with this biopython scripts but have had little success. I believe the quality filter and polyA trimming works correctly but I cannot seem to get the adapters to cut. I have also wrote a function called get_stats that is suppose to return the average length and total reads. I would appreciate any help!

rna-seq biopython sequence • 943 views
ADD COMMENTlink modified 2.5 years ago by ablanchetcohen1.2k • written 2.5 years ago by shanasabri40

Why do you want to invent the wheel? http://prinseq.sourceforge.net/

ADD REPLYlink written 2.5 years ago by WouterDeCoster32k
1
gravatar for dr_bantz
2.5 years ago by
dr_bantz80
dr_bantz80 wrote:

I'm not sure why you would want to do this in python (if nothing else it would take ages). The bbduk utility from the bbmap suite would do all you need. Here's a thread with some info:

http://seqanswers.com/forums/showthread.php?t=42776

ADD COMMENTlink written 2.5 years ago by dr_bantz80
1
gravatar for ablanchetcohen
2.5 years ago by
ablanchetcohen1.2k
Canada
ablanchetcohen1.2k wrote:

I don't understand either why you feel the need to write your own tool. If you're doing this as a programming exercice, your question should be more precise.

Here is a partial list of the existing trimming tools provided by Wikipedia. https://en.wikipedia.org/wiki/List_of_RNA-Seq_bioinformatics_tools#Trimming_and_adapters_removal

BBDuk clean_reads condetri cutadapt Deconseq Erne-Filter FastqMcf FASTX Flexbar FreClu htSeqTools NxTrim PRINSEQ Sabre Scythe SEECER Sickle SnoWhite ShortRead TagCleaner Trimmomatic

ADD COMMENTlink written 2.5 years ago by ablanchetcohen1.2k
1

This is an exercise and I'd like to build my own toolkit for my own analysis so that I know exactly what is happening behind the scenes.

ADD REPLYlink written 2.5 years ago by shanasabri40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1322 users visited in the last hour