Question

transcript and fasta sequence

0

Entering edit mode

6.5 years ago

qudrat ▴ 100

Hello all! I have two file containing ten thousands of transcripts fasta sequences each with different ids and I am interested in finding common sequences between the two files. Somebody please help me as it is hindering my work. Thank you in appreciation

sequence Assembly • 2.2k views

ADD COMMENT • link updated 6.5 years ago by glihm ▴ 660 • written 6.5 years ago by qudrat ▴ 100

0

Entering edit mode

Somebody please help me as it is hindering my work.

In what way?

How about this tool that merges assemblies? See more options here: High quality de novo transcriptome assembly rely on merging multiple assembly? Specifically dedupe.sh from BBMap should be very simple to use.

ADD REPLY • link 6.5 years ago by GenoMax 141k

0

Entering edit mode

ten thousands of transcripts fasta sequences each with different ids

Can you post an example? Is this a de novo assembly?

ADD REPLY • link 6.5 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Actually this a de novo assembly produced by using two different softwares to minimizes the false positives

ADD REPLY • link 6.5 years ago by qudrat ▴ 100

score 0 · Answer 1 · 2017-10-24

0

Entering edit mode

6.5 years ago

glihm ▴ 660

Hello qudrat,

If you are interested in IDENTICAL sequences, you can simply write a very short script to extract identical sequences in both files.
You want to apply a "similarity" score, if so I strongly suggest using multi-aligners (BLAST or MUSCLE for instance) and then parse the results to have a global overview of similarity between sequences from your two different files.
EDIT @genomax commentary: Use of assembly merge-tool.

ADD COMMENT • link 6.5 years ago by glihm ▴ 660

0

Entering edit mode

Hi glihm, Actually I was thinking of sort but I do not know script writing. This is a de novo assembly using two different software and I am doing this to minimizes the false positives.

ADD REPLY • link 6.5 years ago by qudrat ▴ 100

0

Entering edit mode

Your request is now clearer. The answer of @genomax is in this case well suited for your issue by using assembly merging.

ADD REPLY • link 6.5 years ago by glihm ▴ 660