Hello biostars! I downloaded fasta files from http://www.ncbi.nlm.nih.gov/Traces/trace.cgi (mouse genome traces) There are files with 'clip'-prefix, i'm not sure, but is it primers\adapters? Can't find any documentation about this files. So, I want to make clipping and trim my traces according to coordinates from 'clip'-files. After googling, i didn't find any tool for that. All tools are for trimming NGS data. My question: is there any tool for clipping or I need to write my own script? I'm newbie in programming (beginner in python) and have absolutely no idea how to write such a script.
Summary: I have 'clip' file, which looks like
TI CLIP_LEFT CLIP_RIGHT 1101188317 0 576 1101188318 19 734 1101188319 6 742 1101188320 16 809
And 'trace' file, which looks like simple fasta
>1101188317 ATGCAT...all reads are ~1660 b.p. long. >1101188318 ... >1101188319 ... >1101188320 ...
Problem is following:
- i don't understand numbers in clip file (f.ex. "clip right" is the right coordinate of what?)
- it's not clear for me what does it mean 'clip'.
- if numbers are something like coordinates of adapters i need to make trimming (trim sequences in fasta file)
- All tools are for NGS data, but this datasets are from sanger sequensing, so i don't know the adapter sequence, i know just coordinates (if this numbers are coordinates)