Question: How to trim illumina reads for poly-A tail and mini-exon presence
gravatar for Buffo
4.6 years ago by
Buffo1.8k wrote:

I want to assemble my illumina reads from RNA-seq project, I have a reference genome, so, I want to assemble the transcripts with the reference using cufflinks, but my reads comes from a parasite that have a complicated transcriptional maturation that includes insertion of 35 nucleotides (mini-exon) and poly-A tailing for future mRNA translation, so I want to process the 35 mini-exon nucleotides and poly-A tail before assembly. How can I do that? I just had tried to trimming using a simple grep, but I have 34 million of reads where the miniexon could be present with insertions or deletions and grep does not work in that cases, does somebody knows about a perl or python script for do this? Thanks :-)

rna-seq assembly • 1.4k views
ADD COMMENTlink modified 7 months ago by Biostar ♦♦ 20 • written 4.6 years ago by Buffo1.8k

Are the 35 nucleotides the same or are they somehow individually derived from every mRNA?

ADD REPLYlink written 4.6 years ago by genomax92k

thanks for answer, they are the same 35 nucleotides for each mature mRNA, it`s called mini-exon and it is not present in the genome.

ADD REPLYlink written 4.6 years ago by Buffo1.8k

thats why I want to remove from the reads in order to improve the assembly

ADD REPLYlink written 4.6 years ago by Buffo1.8k
gravatar for genomax
4.6 years ago by
United States
genomax92k wrote:

You could use BBDuk from BBMap tools. Add the 35 nucleotides to the "adapters.fa" file (as a fasta entry) in the "resources" directory as a separate entry. If you expect AAAA's to show up then you could add an entry for that as well. BBDuk will trim reads to the right (ktrim=r) when it encounters the sequences in the adapter file in your reads.

ADD COMMENTlink written 4.6 years ago by genomax92k

Thank you for answer and recomendation @genomax2, actually I did that with grep command redirectioning the output, and i found some results (about 1,000 reads with these adaptors) and i search it in 5-3, 3-5, but, i have a litle bit more than 34 million reads, and as we expected, that mini-exon presence is not excent of insertions or snps, so, with that commands you only trimms adaptors without insertions or deletions, im pretty sure that it can be solved by using a python or R script :-( but i will keep trying.

ADD REPLYlink written 4.6 years ago by Buffo1.8k

That is the reason you want to use BBDuk and an entry like following appended to the "adapters.fa" file.


You can allow for errors by using the hdist= option (this is hamming distance). BBDuk will automatically search in both strand orientations of the sequence and partial matches.
You could always remove the poly-A tails/adapters first and then deal with the min-exon down the road.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by genomax92k

I haven`t read enough, my fault, thank you!!

ADD REPLYlink written 4.6 years ago by Buffo1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1408 users visited in the last hour