How to trim poly-sequence before or after trimming by Trimmomatic.
1
1
Entering edit mode
2.3 years ago
Apprentice ▴ 90

Hi.

I'm analyzing RNA-seq data.

My pipeline of RNA-seq was as below. (1) Trimmomatic exclude adapter sequences and low-quality bases from my fastq files. (2) Tophat2 mapped my reads to the reference sequence (hg19).

Trimmomatic didn't remove poly-A sequence from my fastq files.

I would like to know how to trim poly-sequence before or after trimming adapter sequences and low-quality bases by Trimmomatic.

rna-seq • 2.1k views
ADD COMMENT
1
Entering edit mode

The trimmomatic directory should contain fasta files with the adapter sequences to be trimmed. From what I understand you can simply add new sequences to these files, in your case polyA and polyT.

ADD REPLY
0
Entering edit mode

Thank you for you comment. I don't know how sequence should be added to the file as the polyA and polyT.

Could you tell me examples of the file?

ADD REPLY
2
Entering edit mode

Open in a text editor and add

>p-A
AAAAAAAAAAAAAAAAAAAAAAAAA
>p-T
TTTTTTTTTTTTTTTTTTTTTTTTTTT

I am not familiar with trimmomatic, so I cannot tell where the files are, you will have to find out. This has probably been asked before, check with the search function.

ADD REPLY
0
Entering edit mode

Thank you for your advice.

ADD REPLY
1
Entering edit mode
2.3 years ago

For removing polyA/T sequences I have been using prinseq with -trim_tail_right and -trim_tail_left, but that's probably just one of the tools which can do that.

ADD COMMENT
0
Entering edit mode

Thank you for your advice. I'll try it.

ADD REPLY
0
Entering edit mode

I would like to remove polyA tails using prinseq.

My RNA-seq data is paired end. My fastq files were gzipped.

It seems that prinseq can't read gzipped fastq files. I don't want to decompress the fastq files. Could you tell me how to use prinseq for paired-end gzipped fastq files?

ADD REPLY
1
Entering edit mode

Hmmm, I had been using Prinseq for SE data, in which case I could use piping:

zcat reads.fastq.gz | prinseq <other arguments | gzip > trimmed_reads.fastq.gz

But that's probably not an option for you. Then the solution from ATpoint is probably best, to modify the fasta file containing the adapter sequences to trim to also include AAAAAAAAAAAAAAAA and TTTTTTTTTTTTTTTTTTT

ADD REPLY
0
Entering edit mode

Thank you for your reply.

I'll try to use the solution from ATpoint.

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6