Removal of unwanted character "N" from Assambly and Annotated file of whole genome sequemce
0
0
Entering edit mode
4.5 years ago
tasmina.fm • 0

We are working with assembly and annotation of whole genome sequence of bacteria with Linux command (soapdenovo2 and some related software). After assembly and annotation, we got unwanted character N within the fasta file. For this reason, we are not able to analyze it. Would you please help us how can remove it from whole genome sequence fasta file??

assembly genome next-gen • 677 views
ADD COMMENT
0
Entering edit mode

For this reason, we are not able to analyze it => Why not?

N means ambiguous characters that might arise due to repetitive / difficult-to-sequence regions. Most genome assemblies contain them to some extend. The human reference (GRCh38) has > 150mio of them. Is this a short-read assembly?

ADD REPLY
0
Entering edit mode

we got unwanted character N within the fasta file

While you may not want the N they likely signifies that your bacterial genome assembly is not complete. It is not unusual to see this especially if you are only using short-read data. You may need to investigate and re-do the assembly (repeat regions and over-sequencing can cause issues) or add long-read coverage (e.g. PacBio/Nanopore) to truly complete the assembly.

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6