I have a fasta file that contains the coordinates and the sequence of the coordinates:
head my.fasta
>16:23107820-23108019(+)
GTACGGCGCTCCCGGGGCGGCCGGTGGCCTGTAGTCAAGGTCACTAGGACCCGCGTTGAGGTGGGTTGCTTGGCGGCCACACTGCAGGTATGCGGGCTTTTTCTTAGGGCACACACTTCTCCTTGTGCCCTTCGAGAAGCTTCCATGATGGTAAGACTCCAGATGTTGGGGAGACAGGACGGATACAAGAACGGAGTAT
>14:54909471-54909670(-)
GTAAGTGGCACCCTGCCAGAGATCCCTCTCTGCCCTGGGTCTCATGCCTTCCTTTCTGCACCTCCAGACAATTTCTGCTGCCCCTAGGTCCCAGATTTCAGCTGTCCAGATGTCCAGGCCTTTTAAAGGGTCTAGGCAGGGGGTCCTACTGCTCACACAGTCCTCCCACTGGCTGTTATGTTTAAAATCCTAACCTGGC
>7:127020805-127021004(-)
GTAGGTGTGGACGACAGACAGCTGGGTGGCATGAGAATGCAGGTGCCAGGCGAACTAGAGGGTGGTGCTGGGTGCGTCGTACCATCGGGAGAAGATCCCCTCCCCCTCAGCCTCTGCTGAAAGCAACAAGGGAACCCCTAAAAGAAGGGCTAAGAAGGTATGCACAAGATACTGGGTCTTCCCCAAGAATGGGGCTGGA
>X:20848619-20848818(+)
GTGAGGGCAGGCCCGGTAGGGTTCGGGTTTTGGAGCGGCTGCGGGACCCGGGTATGAAGTCCAGACCGAAAGCTCAGCTCCAAGATGCTTCCGTCTGAATCTCAGCGTTCTCCCGCCCGGAACCAAAGGAGTGGTTTGACCAGGGCGAGACCGTCGTCATCGACCGTGGGAGTGGATGGAGGAGTCGGCCTGCAGGCTG
>1:75547398-75547597(+)
GTGGGTAGCCTGGGGACCCCTAGCACCCCAGCCTTCACCACCATCACCTTCATCGCCACCATTACTGCGCTCACCTCCGGCTTGATCACTCAGTGTCATCCTGTGCTGGACGCTGTGCTGGGCCACCATGCCATGTTAAGTCATCCTGCCTCTCATACCATCATCACCTTGTTCACCTGTCAGGGGAGATGTAGGGGAG
I used grep "^>" my.fasta > mycoord.csv
and grep "^>" my.fasta > mycoord.bed
to extract the coordinates. Now I have seen them there:
$head mycoord.csv
(or head mycoord.bed
)
>16:23107820-23108019(+)
>14:54909471-54909670(-)
>7:127020805-127021004(-)
>X:20848619-20848818(+)
>1:75547398-75547597(+)
>11:102777648-102777847(+)
>7:25314905-25315104(+)
>2:180025312-180025511(+)
>7:30533903-30534102(-)
>X:8128769-8128968(-)
My question is how to remove the ">" before each coordinates and how can I make the coordinates into several columns so that I can get the gene name using the coordinates and strand information (I don't know how to express the ">" stuff so that I when I searched how to remove ">" in coordinates I almost got nothing)?
Thanks,
I found the answer to the first part: I used
$ sed 's/>//' mycoord.csv > mycoord_1.csv
to remove ">"