Question: Extract Domain Sequences From Multiple Sequences
1
gravatar for Palu
8.9 years ago by
Palu270
Palu270 wrote:

Hi, I have 100 protein sequences with some conserved domains. I want to extract the domain sequences in a go. is it possible. Although CDD gives us the boundry of the domains but didn't give the sequences of the domain. i am a window user.

domain protein • 4.8k views
ADD COMMENTlink written 8.9 years ago by Palu270
1

What do you have as input? Sequences (FASTA or which other format) or a list of accession numbers (Uniprot or which other database)?

ADD REPLYlink written 8.9 years ago by Lyco2.3k

Also: do you want the consensus sequence of the conserved domain or the one in your sequences?

ADD REPLYlink written 8.9 years ago by Michael Schubert6.9k

Are you and @Moon from Finding The Sequence Of A Domain working on the same assignment?

ADD REPLYlink modified 8 months ago by RamRS27k • written 8.9 years ago by Aleksandr Levchuk3.2k

no we are not working on the same project :).

ADD REPLYlink written 8.9 years ago by Palu270

OK, Thanks! I will trust you on this. By the way, welcome to Biostars.org!

ADD REPLYlink written 8.9 years ago by Aleksandr Levchuk3.2k
3
gravatar for Rm
8.9 years ago by
Rm8.0k
Danville, PA
Rm8.0k wrote:

If you know the domain boundary coordinates: than its very simple using input multiple sequence fasta file.

  1. using blast "formatdb" format your fasta files.
  2. use fastacmd with -s sequence name -L start, end :

Example: fastacmd -d refseq_protein -s NP_112245 -L 100,160

input "list_file" file with three columns "seq_id" "start" "end"

   awk '{system("fastacmd -d input_fasta.fa -s "$1" -L "$2","$3"");}' list_file

for additional information check this

ADD COMMENTlink modified 8.9 years ago • written 8.9 years ago by Rm8.0k
2
gravatar for Khader Shameer
8.9 years ago by
Manhattan, NY
Khader Shameer18k wrote:

Have you tried Batch CDD search option ?

ADD COMMENTlink written 8.9 years ago by Khader Shameer18k

To expand on that: if you want the exact hit positions, use the rpsblast command-line tool.

ADD REPLYlink written 8.9 years ago by Michael Schubert6.9k
1
gravatar for Aleksandr Levchuk
8.9 years ago by
United States
Aleksandr Levchuk3.2k wrote:

My answer here

Finding The Sequence Of A Domain

solves your question.

ADD COMMENTlink modified 8 months ago by zx87549.2k • written 8.9 years ago by Aleksandr Levchuk3.2k

actually I have problem with r script. Do you know any perl solution for that?

ADD REPLYlink written 8.9 years ago by Palu270

No, but it's very easy to install R (http://cran.cnr.berkeley.edu) also you will like R's IDE (http://rstudio.org) all are available for Linux, Mac, and Windows.

ADD REPLYlink written 8.9 years ago by Aleksandr Levchuk3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1166 users visited in the last hour