Question

identification of adapters from fastq file.

0

Entering edit mode

8.3 years ago

Sara ▴ 280

I have fastq file from Ribo-seq experiment. I got them from a paper but no information about the adapter or linker is available to process the data. I asked the Author and they told me "try NEB small RNA kit or illumina small RNA kit". I did but the processed data does not align very well. so I think I did not use the right adapter. do you guys know how to get the adapters from fastq file?

RNA-Seq • 4.3k views

ADD COMMENT • link updated 8.3 years ago by John 13k • written 8.3 years ago by Sara ▴ 280

WouterDeCoster · Answer 1 · 2017-03-26

This question has come up a lot before, and the answer is basically "if you really have no idea what adapters were used, you're screwed", because we don't have any tools to identify custom adapter sequences yet. You can detect over-represented sequences, but this is really a different sort of problem. The exception is BBMap which can remove custom adapters if you have very short insert-lengths and paired-end sequencing, thus adapters are routinely sequenced into a lot, and the paired-end sequencing allows you to distinguish insert from adapter very clearly. But again your insert length has to be consistently shorter than the read length.

If you know that your adapter is a standard adapter, you just don't know exactly which one, then that's a different (and much simpler) problem than finding a custom adapter. For this you just need to point a trimming tool like Picard's MarkAdapters to your reads, and provide it with a list of all possible adapters (built in to the Picard MarkIlluminaAdapters if you're using Illumina), and it will remove all adapter sequences, and count the number of times each adapter sequence was removed, from which you can determine the adapters used. You might have to re-run the adapter marking tool now that you know the actual adapter sequence, although perhaps Picard automatically does this for you. Not sure.

There's enough tough questions to answer in Bioinformatics, and solving man-made problems like figuring out the adapter sequence used isn't a top priority for many tool designers, unfortunately. However given the funding (or lack of) it's not surprising.

score 1 · Answer 2 · 2017-03-26

I find it disconcerting that the authors can't tell you what adapters to use .. It makes no sense since they did the experiment and who else other than them to know what they did.

That said, problems with the alignment could stem from some other issue. Are you trying to follow a pipeline that they used and are not able to make it work? Or are you following an alternate pipeline of your own?

If you know for sure that the inserts in this case are smaller than the length of sequencing then you could follow a solution for finding unknown primers using a tool from BBMap suite in this post. But before you go through the trouble see if you can use the adapters.fa file included in the BBMap distribution (resources folder, it includes most standard adapter sequences) to see if that improves on scan/trim operation. You can use the entire file for that scan.