Question: to find same or similar sequences within fasta seq
0
gravatar for Kurban
4.4 years ago by
Kurban170
china/Urumqi/xinjiang academy of animal scinces
Kurban170 wrote:

Hello, i am trying to find out the best way to find same or similar sequences to the defined sequence within the transcriptome sequences in fasta file, which is assembled from RNA-seq data. i know there r many tools, but i dont know which one is developed for this purpose. could any one give me some tips?
thanks?

similar srquence • 1.4k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Kurban170

I'm not clear on what exactly you are looking to do -- compare sequences from different samples or within the same sample?  There are many different strategies to do both - from clustering (usearch/UPARSE/cd-HIT, etc) to alignment (BLAST, etc.).  Can you please clarify your original post with your research question?

ADD REPLYlink written 4.4 years ago by Josh Herr5.6k

sorry @Josh Herr, i have not been clear.
okay, i have a fasta file which contain around 144,000 transcripts/sequences(transcriptome of an insect). my boss gave me several nucleotide sequences and asked me is there any similar or same sequences in the fasta file with those sequences? if any, which one and how r their similarity?
i want to align those sequences one by one with the transcriptome(fasta file).

i am new at this kind of analysis

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Kurban170

Sounds like blast would be a good solution. You can install it locally and use it from the command line.

ADD REPLYlink written 4.4 years ago by Devon Ryan89k
1
gravatar for Siva
4.4 years ago by
Siva1.6k
United States
Siva1.6k wrote:

You can create a BLAST database of those 144,000 transcripts and do BLASTN search using the nucleotide sequences as query.

ADD COMMENTlink written 4.4 years ago by Siva1.6k
0
gravatar for geek_y
4.4 years ago by
geek_y9.4k
Barcelona/CRG/London/Imperial
geek_y9.4k wrote:

You can use cd-hit-est or usearch for this purpose. They will make one representative sequence from similar sequences, which is based on user defined % similarity.

If you need to compare them against another set of sequences, you need to perform blast or any similar alignment.

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by geek_y9.4k

Free version of usearch (32 bit)  will be very slow.

ADD REPLYlink written 4.4 years ago by geek_y9.4k
0
gravatar for Kurban
4.4 years ago by
Kurban170
china/Urumqi/xinjiang academy of animal scinces
Kurban170 wrote:

thank you @Josh Herr ,@ Siva and @Geek_y

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Kurban170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1007 users visited in the last hour