Question: How To Remove Repeated Nucleotide Sequences
0
gravatar for rezwan.02
6.3 years ago by
rezwan.0220
rezwan.0220 wrote:

I need to remove repeated nucleotide sequence from the sequences I am working with. The sequences are in FASTA format. Can i do this by using BLASTn ? How? Thank you.

repeats • 2.7k views
ADD COMMENTlink modified 6.3 years ago by SES8.2k • written 6.3 years ago by rezwan.0220
1

Can you be a little more clear in what exactly you are asking? What is your research question? I don't understand why you would use blastn?

ADD REPLYlink written 6.3 years ago by Josh Herr5.6k
1

If you look in the comments to Giovanni's answer, it appears the goal is to remove repeat sequences (i.e., create a non-redundant set). There is a dearth of information about the data and question, so all we can really offer are some general guidelines/approaches. I don't think blastn has anything to do with this question, other than it is probably a familiar tool.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by SES8.2k
1

Question is unclear due to use of the term "repeated nucleotide sequence". You need to clarify whether you want to remove (1) entire sequences which are duplicates or (2) repeat regions within individual sequences.

ADD REPLYlink written 6.3 years ago by Neilfws48k
2
gravatar for SES
6.3 years ago by
SES8.2k
Vancouver, BC
SES8.2k wrote:

It sounds like you are trying to create a set of non-redundant sequences. There are many methods to do this but the important thing to consider is how many sequences you have because not all methods are equal in this regard. Uclust may work just fine, though if you have a lot of sequences (e.g., one or more lanes of HiSeq) I don't think this will work out well, in my experience. I recommend you try Vmatch if you have access to it (there is a section in the manual on how to accomplish this task).

ADD COMMENTlink written 6.3 years ago by SES8.2k
1
gravatar for Giovanni M Dall'Olio
6.3 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

The proper name of what you are asking for is "repeat masking". So, you have to look for tools to "mask" low entropy sequences.

There are many tools to mask sequences:

  • Repeat Masker is a web tool, that allows to mask sequences without much configuration
  • If your data comes from Humans, you can use the mask maps used in 1000 Genomes, to remove all the regions that are not accessible to NGS methods
ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Giovanni M Dall'Olio26k

sir, i have RNA sequences and i need to conduct a BLASTn search against itself to remove repeated sequences ?

ADD REPLYlink written 6.3 years ago by rezwan.0220

sorry, in this case I didn't understand your original question.

ADD REPLYlink written 6.3 years ago by Giovanni M Dall'Olio26k
1
gravatar for Daniel
6.3 years ago by
Daniel3.7k
Cardiff University
Daniel3.7k wrote:

I think you might be referring to something along the lines of uclust, so you can reduce the number of identical (or close) sequences. But it's not entirely clear.

ADD COMMENTlink written 6.3 years ago by Daniel3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 825 users visited in the last hour