How to remove adaptors if you don't know their names/sequences? (eg from public data)
5.3 years ago
rioualen ▴ 620

Hello,

I would like to use cutadapt and/or trimmomatic to trim my sequences. I'm usually analyzing data from GEO or Array Express, and it's not always possible to get information about adaptors. Also, it's not always possible to trust information provided there...

I can see that fastQC detects adaptor sequences, including those with a slightly different sequence. So I'm thinking there should be a way to trim adaptors automatically, without knowing their sequence or name?

Thanks for tips

Edit: just found out FastQC uses a file called "contaminant_list.txt". I guess I could write some script to look for these adapters in my data, but I'm not sure it's the best way to go though.

5.3 years ago
ivivek_ngs ★ 5.1k

use cutadapt or trimmotic or BBmap (BBmap includes popular adapter.fa file which can be used) Ideally if its Illumina sequences then you can have the custom file for the sequences and use them for removal. In other way is to check the over-represented sequences which can also be checked and removed.

Thanks for the links, I'll look into it.

I need to integrate that in my snakemake pipelines and I'd like it to be compatible with multiQC. Trim_galore should do the trick!