Question

How To Remove Repeated Nucleotide Sequences

0

Entering edit mode

11.1 years ago

rezwan.02 ▴ 60

I need to remove repeated nucleotide sequence from the sequences I am working with. The sequences are in FASTA format. Can i do this by using BLASTn ? How? Thank you.

repeats • 4.3k views

ADD COMMENT • link updated 11.1 years ago by SES 8.6k • written 11.1 years ago by rezwan.02 ▴ 60

1

Entering edit mode

Can you be a little more clear in what exactly you are asking? What is your research question? I don't understand why you would use blastn?

ADD REPLY • link 11.1 years ago by Josh Herr 5.8k

1

Entering edit mode

If you look in the comments to Giovanni's answer, it appears the goal is to remove repeat sequences (i.e., create a non-redundant set). There is a dearth of information about the data and question, so all we can really offer are some general guidelines/approaches. I don't think blastn has anything to do with this question, other than it is probably a familiar tool.

ADD REPLY • link 11.1 years ago by SES 8.6k

1

Entering edit mode

Question is unclear due to use of the term "repeated nucleotide sequence". You need to clarify whether you want to remove (1) entire sequences which are duplicates or (2) repeat regions within individual sequences.

ADD REPLY • link 11.1 years ago by Neilfws 49k

score 2 · Answer 1 · 2013-03-27

It sounds like you are trying to create a set of non-redundant sequences. There are many methods to do this but the important thing to consider is how many sequences you have because not all methods are equal in this regard. Uclust may work just fine, though if you have a lot of sequences (e.g., one or more lanes of HiSeq) I don't think this will work out well, in my experience. I recommend you try Vmatch if you have access to it (there is a section in the manual on how to accomplish this task).

score 1 · Answer 2 · 2013-03-27

1

Entering edit mode

11.1 years ago

Giovanni M Dall'Olio 28k

The proper name of what you are asking for is "repeat masking". So, you have to look for tools to "mask" low entropy sequences.

There are many tools to mask sequences:

Repeat Masker is a web tool, that allows to mask sequences without much configuration
If your data comes from Humans, you can use the mask maps used in 1000 Genomes, to remove all the regions that are not accessible to NGS methods

ADD COMMENT • link 11.1 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

sir, i have RNA sequences and i need to conduct a BLASTn search against itself to remove repeated sequences ?

ADD REPLY • link 11.1 years ago by rezwan.02 ▴ 60

0

Entering edit mode

sorry, in this case I didn't understand your original question.

ADD REPLY • link 11.1 years ago by Giovanni M Dall'Olio 28k

score 1 · Answer 3 · 2013-03-27

1

Entering edit mode

11.1 years ago

Daniel ★ 4.0k

I think you might be referring to something along the lines of uclust, so you can reduce the number of identical (or close) sequences. But it's not entirely clear.

ADD COMMENT • link 11.1 years ago by Daniel ★ 4.0k