Question

biopython script to preprocess raw RNA-seq reads (quality filtering, polyA and adapter trimming)

0

Entering edit mode

8.1 years ago

shanasabri ▴ 40

Hello,

I have a raw, unaligned fastq.gz file that I am trying to preprocess using Biopython before alignment. I would ultimately like to remove low quality reads, trim polyA tails, trim adapters using fuzzy matching, and finally remove reads that do not satisfy a length requirement after all said preprocessing. It would also be neat to specify how many reads satisfy the filtering criteria at each step. I have been playing around with this biopython scripts but have had little success. I believe the quality filter and polyA trimming works correctly but I cannot seem to get the adapters to cut. I have also wrote a function called get_stats that is suppose to return the average length and total reads. I would appreciate any help!

RNA-Seq Biopython sequence • 2.7k views

ADD COMMENT • link updated 8.1 years ago by ablanchetcohen ★ 1.2k • written 8.1 years ago by shanasabri ▴ 40

0

Entering edit mode

Why do you want to invent the wheel? http://prinseq.sourceforge.net/

ADD REPLY • link 8.1 years ago by WouterDeCoster 47k

score 1 · Answer 1 · 2016-04-04

1

Entering edit mode

8.1 years ago

dr_bantz ▴ 110

I'm not sure why you would want to do this in python (if nothing else it would take ages). The bbduk utility from the bbmap suite would do all you need. Here's a thread with some info:

http://seqanswers.com/forums/showthread.php?t=42776

ADD COMMENT • link 8.1 years ago by dr_bantz ▴ 110

score 1 · Answer 2 · 2016-04-04

1

Entering edit mode

8.1 years ago

ablanchetcohen ★ 1.2k

I don't understand either why you feel the need to write your own tool. If you're doing this as a programming exercice, your question should be more precise.

Here is a partial list of the existing trimming tools provided by Wikipedia. https://en.wikipedia.org/wiki/List_of_RNA-Seq_bioinformatics_tools#Trimming_and_adapters_removal

BBDuk clean_reads condetri cutadapt Deconseq Erne-Filter FastqMcf FASTX Flexbar FreClu htSeqTools NxTrim PRINSEQ Sabre Scythe SEECER Sickle SnoWhite ShortRead TagCleaner Trimmomatic

ADD COMMENT • link 8.1 years ago by ablanchetcohen ★ 1.2k

1

Entering edit mode

This is an exercise and I'd like to build my own toolkit for my own analysis so that I know exactly what is happening behind the scenes.

ADD REPLY • link 8.1 years ago by shanasabri ▴ 40