How To Remove Repeated Nucleotide Sequences
3
0
Entering edit mode
11.1 years ago
rezwan.02 ▴ 60

I need to remove repeated nucleotide sequence from the sequences I am working with. The sequences are in FASTA format. Can i do this by using BLASTn ? How? Thank you.

repeats • 4.3k views
ADD COMMENT
1
Entering edit mode

Can you be a little more clear in what exactly you are asking? What is your research question? I don't understand why you would use blastn?

ADD REPLY
1
Entering edit mode

If you look in the comments to Giovanni's answer, it appears the goal is to remove repeat sequences (i.e., create a non-redundant set). There is a dearth of information about the data and question, so all we can really offer are some general guidelines/approaches. I don't think blastn has anything to do with this question, other than it is probably a familiar tool.

ADD REPLY
1
Entering edit mode

Question is unclear due to use of the term "repeated nucleotide sequence". You need to clarify whether you want to remove (1) entire sequences which are duplicates or (2) repeat regions within individual sequences.

ADD REPLY
2
Entering edit mode
11.1 years ago
SES 8.6k

It sounds like you are trying to create a set of non-redundant sequences. There are many methods to do this but the important thing to consider is how many sequences you have because not all methods are equal in this regard. Uclust may work just fine, though if you have a lot of sequences (e.g., one or more lanes of HiSeq) I don't think this will work out well, in my experience. I recommend you try Vmatch if you have access to it (there is a section in the manual on how to accomplish this task).

ADD COMMENT
1
Entering edit mode
11.1 years ago

The proper name of what you are asking for is "repeat masking". So, you have to look for tools to "mask" low entropy sequences.

There are many tools to mask sequences:

  • Repeat Masker is a web tool, that allows to mask sequences without much configuration
  • If your data comes from Humans, you can use the mask maps used in 1000 Genomes, to remove all the regions that are not accessible to NGS methods
ADD COMMENT
0
Entering edit mode

sir, i have RNA sequences and i need to conduct a BLASTn search against itself to remove repeated sequences ?

ADD REPLY
0
Entering edit mode

sorry, in this case I didn't understand your original question.

ADD REPLY
1
Entering edit mode
11.1 years ago
Daniel ★ 4.0k

I think you might be referring to something along the lines of uclust, so you can reduce the number of identical (or close) sequences. But it's not entirely clear.

ADD COMMENT

Login before adding your answer.

Traffic: 2751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6