I have two DNA sequence files, one generated by a company based off a data file I sent them. The company's file has different sequence headers for some sequences than the data file I sent, and it's important that all of it conforms to the format I sent for certain programs to use it. Is there a way that I can search a part or all of the sequence header in the company files and then replace the entire line with the corresponding header from my original data file? Additional notes: 1) the company reverse complemented some of the sequences, and this was necessary. Thus, I do not want to alter the sequences from the company file, just get the headers looking like those from the original one. 2) Those sequences that were reverse complemented have an
_rc appended to the end of the headers.
For example: Company's header
>uce-265_p7 |design:hemiptera-v1,designer:faircloth,probes-locus:uce-265,probes-probe:7,probes-source:halhal1,probes-global-chromo:Scaffold391,probes-global-start:41871,probes-global-end:41991,probes-local-start:0,probes-local-end:120 TCGAGCAACTTTTCATAAATGACCTGCACAACTTCGAATGACTTAAGTGATGTAATAGAATAAACAAGAACATAGCCATGTATGTCCATTGAATATTGCGCAGGGAATATTGAATACTCA
The company's headers should look like the original. My initial thought is an if/then loop with grep, but I'm having trouble imaging how this would work in this case.