How to delete 16s ribosomal sequence from a genbank or fasta file?
1
0
Entering edit mode
7.9 years ago
benzhang ▴ 20

Hello, all! I want to delete 16s sequence from bacteria genome in either gbk or fasta format. I dob't have any access to commercial software. Are there any open source programs or command line tools that can achieve this? Thanks!

Best, Ben

genome sequence • 2.8k views
ADD COMMENT
0
Entering edit mode

You could open your fasta file and locate the 16S sequence start (by find) and then delete the range you need with any text editor on *nix/OS X. If you are working on windows then try ApE or SnapGene Viewer.

ADD REPLY
1
Entering edit mode
7.9 years ago

Using the (free) BBMap package:

If you know the 16S sequences (or those of a close relative), you can align them to the genome to produce a sam file. For example:

mapPacBio.sh ref=genome.fasta in=16S.fasta out=mapped.sam ambig=all maxindel=20

Then you can run BBMask:

bbmask.sh in=genome.fasta out=masked.fasta sam=mapped.sam masklowentropy=false maskrepeats=false

This will mask the sequences covered by the mapped 16S sequences. You can alternatively do it using BBDuk's kmer-based masking mode:

bbduk.sh in=genome.fasta out=masked.fasta ref=16S.fasta k=200 hdist=1 kmask=N

...but I'd suggest the alignment method unless you encounter problems with it.

ADD COMMENT
0
Entering edit mode

And then delete the masked region?

This answer has left me wondering if @benzhang wanted to delete as in remove or mask the sequence so it is no longer considered. Guess we will find out.

ADD REPLY
0
Entering edit mode

I'm trying to map bacterial genomes to human microbiome data without considering 16s sequence. Would removing and masking achieve the same result? Thanks!

ADD REPLY
0
Entering edit mode

In a way yes. But masking would preserve the overall coordinates so that may be the better option to use.

ADD REPLY
0
Entering edit mode

I see, thank you so much!

ADD REPLY
0
Entering edit mode

Thanks, Brian! Would it work if I find the 16s sequence from the gbk file, and align them to the fasta file, and then mask the aligned region?

ADD REPLY
0
Entering edit mode

Yes, that would work. But I'm not familiar with gbk format. The sequence would need to be transformed to fasta first, which is fairly straightforward.

ADD REPLY
0
Entering edit mode

Got it, Thanks, Brian!

ADD REPLY

Login before adding your answer.

Traffic: 2079 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6