so any method to remove header part of FASTA file
0
0
Entering edit mode
3.7 years ago
CHINMAYA ▴ 10

so any method to remove header part like( _ORF.1 85-439 type:complete length:354 frame:2 start:CTG stop:TAA ) in orfs.fa?

>MSTRG.942.2_j_ORF.1 [85-439](+) type:complete length:354 frame:2 start:CTG stop:TAA
CTGTTGAGTATAGATTCCTTTTTCACTCAgtaagcaaaaaaaaaagtagatctGAAACCCAT
CTTTCTATCAAGAACCCCCAGCTCCATTTCCACGCCCCATCTCCGGCCTCCGCGACACATAT
ATCCATTTTCGTGCTCTTCATCTCCTAAGCTTTCATTTGAACCGAATAAATCAACTTTTGAA
GCAACTTCGTGGTCAACCCATTTTCTTCCCTTCCCGGTAATACTCTTTTTCCGGTCACCTTT
CCTCTTTTCTTCCTTCTCTCTTACCtatttattccttttttttgtctttcaaaACTGTAGTT
TTTgtcctttttattttcttcttctagaTGCATTTTTTATTCCT
fasta • 1.1k views
ADD COMMENT
0
Entering edit mode

This depends how your headers looks like in general. For example this would remove anything from the first _ on:

$ sed '/>/ s/_.*//g' my.fasta

>MSTRG.942.2
CTGTTGAGTATAGATTCCTTTTTCACTCAgtaagcaaaaaaaaaagtagatctGAAACCCAT
CTTTCTATCAAGAACCCCCAGCTCCATTTCCACGCCCCATCTCCGGCCTCCGCGACACATAT
ATCCATTTTCGTGCTCTTCATCTCCTAAGCTTTCATTTGAACCGAATAAATCAACTTTTGAA
GCAACTTCGTGGTCAACCCATTTTCTTCCCTTCCCGGTAATACTCTTTTTCCGGTCACCTTT
CCTCTTTTCTTCCTTCTCTCTTACCtatttattccttttttttgtctttcaaaACTGTAGTT
TTTgtcctttttattttcttcttctagaTGCATTTTTTATTCCT

To adopt the regex to your needs, we need to know more about how the header is structured.

ADD REPLY
0
Entering edit mode
>MSTRG.942.2_j_ORF.1 [85-439](+) type:complete length:354 frame:2 start:CTG stop:
TAACTGTTGAGTATAGATTCCTTTTTCACTCAgtaagcaaaaaaaaaagtagatctGAAACCCAT
CTTTCTATCAAGAACCCCCAGCTCCATTTCCACGCCCCATCTCCGGCCTCCGCGACACATAT
ATCCATTTTCGTGCTCTTCATCTCCTAAGCTTTCATTTGAACCGAATAAATCAACTTTTGAA
GCAACTTCGTGGTCAACCCATTTTCTTCCCTTCCCGGTAATACTCTTTTTCCGGTCACCTTT
CCTCTTTTCTTCCTTCTCTCTTACCtatttattccttttttttgtctttcaaaACTGTAGTT
TTTgtcctttttattttcttcttctagaTGCATTTTTTATTCCT

>MSTRG.944.3_j_ORF.1 [162-489](+) type:complete length:327 frame:1 start:ATG stop:
TGAATGCCGACGTACAAGATTAGGGGAATCGACGTAGATTTTCCCTACGAAGCCTATGATTCCCA
ACTCGTTTACATGGACAAAGTCATGCAATCGCTTCAGGAGGTAGCGATTGACTCACTCAATC
ATTGCACTTTTGATTATTTAAGctacttttgatgtatttttattttattttatggtagTTCC
GCTGTGGTTGTTGTAATAATcgactaataattaattataaacatgATTTTGGATCAATTGGA
AGTGATCacaaaatgttaatatttaCTTGTTGTCAGGCAATTTGAAATTGATGTTGTTAAGA
TCATGATTGATCAGCAG

>MSTRG.944.3_j_ORF.2 [3141-3549](+) type:complete length:408 frame:1 start:TTG stop:
TAGTTGAAAATTTGTGCTGTATATTTTCTGCTATCTCGGGTACAGGatgatattttctttattgc
agCACTTCTTCTAAAACTTGAAAAGCGCATTGCTGAGGTGCATATTGAATCTAAGGAGTTGG
GGTTTACTAAACCCGGGCCCTATATGTTTGAACTGCTTGCTGATCTTAATATCACTCACAAG
ACTGCTTCTAAGCTTAAGAGTATAATAGCTGAAGCTTCAACTCTCATTGAGGAAAATAATCA
GGAGAAATCAACTGGCACCATCTGCAGATTGGATACTATCAAGGATATTCTTGACATTGTTT
TCAGGGATGGAAGAACTTCTCATGCTAAATACTATCGTGTAAGTTTTGAATTATCGTTTACA
CTTCAGTGGATTGATTTTGTTTGTCTTGTTGCTTCC

>MSTRG.944.3_j_ORF.8 [9923-10301](-) type:complete length:378 frame:-1 start:TTG stop:
TGATTGACTCAGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCA
GGAAGATTGTCTTTTTGTCCAACAATCTCAAGATTTCTACTTTGAGAGGTCACATTATCTTT
ACATTGCAGATCTATATCACCTTGAGAACTTTGAAGAGCTATATGACAAATCTAATGTTTAT
CATGAT

>MSTRG.944.3_j_ORF.9 [11413-11737](-) type:complete length:324 frame:-2 start:ATG stop:
TAAATGGATATATATTATCATAGGACGGTAATCTGGGGAACATACCAAGGAGCACGGTTCCGTTC
CCTAATGACTCGCCGGAGCCAAAGAAATTCCCTGACCAGTGACCATTCTCTTGAAGTATTTC
CAGagatagaaaatttatataccTCTGCTGATCCACCCAGGTCTCAATCACAACATCAAATC
TCCGGAAGACCCTTTCCAACATATGCTTTGTGTGACACTAAGCCACAAAGCATACCTTCTGT
AACGAGGTTCCCAAATAATTATCTGTTTGAATTAAAGCCCAGAACCAGACATAATGAGATCA
TTTTCTCTATCAAA

>MSTRG.1194.1_u_ORF.1 [2308-2830](+) type:complete length:522 frame:2 start:CTG stop:
TAGCTGATGGAGGCGAACCTGAATGTTATGATGAAGCCATGGAGAGTGATGAAAAGAAGAAGTGG
TTGGATGCTATGCAAGAgtcccgtggttgtttacttctcacattgagaaggtttttccacgt
taaaaattattgtgtcatttgtgattggtgattttagttgctgtgattatttgttgattgct
cctcacagattcttgcaagtttgggaaattgattatccgctgcatattgctctgttagagtt
gttattatatttgttgttgaatttcccatcagagtggcatcagagctctttggttaaggggc
tgtttgacttgtttgaatgatggaggcaaatacaaatagaatgatttgtttgaatggcacta
attatcacttatggaagggaaaaatgaaggatctgttatttgtgaagaatttGCATCTTCCT
GTGTTTGCTACTGAGAAGCCAGAATCCAAGACTGATGAGGATTGGAGCTTTGAACATCAGCA
GGTCTGTGGTTTTATTCGGCAATATG

>MSTRG.1194.1_u_ORF.2 [2333-2771](-) type:complete length:438 frame:-1 start:TTG stop:
TAATTGGATTCTGGCTTCTCAGTAGCAAACACAGGAAGATGCtttaagaagtgtttattgtctag
gaagtaaaaagggaaggtattcactattaatcacggtaagtttgtttagtaagataaacata
aacggaggtagtaagtttgttcagtttgtcggggaattggtttctcgagactacggtgagac
taccctttaagttgttgtttatattattgttgagattgtctcgttatacgtcgcctattagt
taaagggtttgaacgttcttagacactcctcgttagttgtttattagtgtcgttgattttag
tggttagtgtttactgtgttattaaaaattgcacctttttggaagagttacactcttcattt
gttggtgccctgTCTTGCATAGCATCCAACCACTTCTTCTTTTCATCACTCTCCATGGCTTC
ATCA
ADD REPLY

Login before adding your answer.

Traffic: 1624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6