Question: Replace matching sections of two fastas
0
gravatar for jamie.pike
6 weeks ago by
jamie.pike50
jamie.pike50 wrote:

I have a master fasta file (File_1.fasta), and another fasta file (File_2.fasta). For every instance where the header in File_2.fasta matches the header in File_1.fasta (apart from "/rc"), I would like the header and subsequent sequence in File_1.fasta to be replayed with the header and subsequent sequences from File_2.fasta.

E.g

File_1.fasta

>header1 
ATGCCTTCCTCAAAGGGATACG
>header2 
ATTGGAATTTGCATCCGAGGGC

File_2.fasta

>header2/rc
GCCCTCGGATGCAAATTCCAAT

Output file

>header1
ATGCCTTCCTCAAAGGGATACG
>header2/rc
GCCCTCGGATGCAAATTCCAAT

Are there any tools which will do this? I imagine it can be done with awk but I am not competent enough with awk to do it.

Thank you

awk fasta • 118 views
ADD COMMENTlink modified 6 weeks ago by Pierre Lindenbaum130k • written 6 weeks ago by jamie.pike50
2
gravatar for Pierre Lindenbaum
6 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:
  • linerize both fasta files
  • remove the '/rc' suffix with sed
  • use sort to sort both linearized files on the sequence name.
  • use join to select the sequences present in linearized1 but not in linerarized2
  • use join to select the sequences in linearized1 and in linerarized2, use cut to only select the 2nd sequence
  • convert back to fasta using tr
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Pierre Lindenbaum130k

Great thank you - could you please elaborate on the join and cut sections? How do I use join to select the sequences present in linearized1 but not in linerarized2, join to select the sequences in linearized1 and in linerarized2, and then cut to only select the 2nd sequence? I have had a look at the manual and I don't fully understand.

ADD REPLYlink written 6 weeks ago by jamie.pike50
join -t $'\t' -v 1 -1 1 -2 1 file1.tsv  file2.tsv > only_in_1.tsv
ADD REPLYlink written 6 weeks ago by Pierre Lindenbaum130k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 616 users visited in the last hour