How to replace/fill "Ns" in fasta with reference file having same coordinates
Dear community,

Hope you are doing great. As asked in title, please guide if there is any way to fill or replace N or N's in fasta file with the help of reference file.

For example

# INPUT

Fasta with Ns

>fasta1
ACTGGCATCATGNNNNACTTTTGACC


Reference Fasta

>reference
ACTGGCATCATGTCAGACTTTTGACC


## OUTPUT

>fasta1
ACTGGCATCATG**TCAG**ACTTTTGACC


I will really appreciate any help in this regard

you have to explain how your problem is different from cp ref.fa user.fa

It is different in a way that it is exactly/completely copying the complete ref.fa into user.fa. However, what I want is manipulation at "N" regions only. For example: my ref file =ACTGGCATCATGTTTTACTTTTGACC and user file is=ACTGGCATCATGNNNNACTTTTGACC. So i want only Ns to be replaced by TTTT [as specified in reference] and not change any other characters. Hope it answers your query

Is the each entry in the "Fasta with Ns" always going to be exactly the same length as the equivalent entry in the "reference fasta"?.

0
Entering edit mode

Yes, its the same. coordinates wise. However the length of Ns might be different across the complete FASTA

I assume the query file has only one short sequence, neither contigs/scaffolds nor the whole assembly.

Here's a semi-automatic way:

1. Searching in the reference

 seqkit locate --degenerate --pattern-file test.fasta ref.fasta
seqID   patternName     pattern strand  start   end     matched
reference       fasta1  ACTGGCATCATGNNNNACTTTTGACC      +       7       32      ACTGGCATCATGTCAGACTTTTGACC

2. Replacing queries with matched sequences

 seqkit replace --by-seq -p ACTGGCATCATGNNNNACTTTTGACC -r ACTGGCATCATGTCAGACTTTTGACC test.fasta
>fasta1
ACTGGCATCATGTCAGACTTTTGACC


However the length of Ns might be different across the complete FASTA

If there are lots of records, a script is needed. For every record:

1. Extract subsequences around N+
2. Index on the reference using the subsequences.
3. Return the subsequences on the ref.