I'm writing a software tool that will extract gene sequences from the chromosome. Can anyone help me with this?
For each (non-header) line in the gene file, create an entry for the FastA file as follows. For a sequence like:
NM_00100344 chr11 + 5925152 5926098 5925152 5926098 2 5925152,5925652, 5925404,5926098,
I want an information line that looks like the following (unspliced version):
>NM_00100343|chr11(+):5925152Z5926098
or (spliced&version):
>NM_00100343|chr11(+):5925152Z5926098|5925151Z5925404,5925652Z5926098
Following the information line should be the actual sequence (extracted from the chromosome file, and reverse-complemented if necessary), using a column width of 70.
You don't have to show me exactly how to do it, I just want a way to get started.