t-SNE analysis on multiple fasta files in R
1
0
Entering edit mode
2.4 years ago
anasjamshed ▴ 120

I have 20 fasta files and I want to perform t-SNE analysis on these files. Is it possible?

This is the workflow that I want to follow in R:

flow

R • 1.2k views
ADD COMMENT
1
Entering edit mode

Can you explain the experiment in more detail? It's unclear what you want to do.

ADD REPLY
0
Entering edit mode

I want to perform t-SNE analysis on multiple fasta files

ADD REPLY
2
Entering edit mode

"Doing a t-SNE" is not a biological question. A relevant type of question would be "I want to measure parameter X on this data" or "test hypothesis A". For example if you want to know the evolutionary relatedness of the genes you sequenced, you would usually build a phylogenetic tree from an alignment, while a t-SNE would likely be irrelevant.

ADD REPLY
1
Entering edit mode
2.4 years ago
Mensur Dlakic ★ 27k

You may already know this, but just for the random reader: t-SNE does not work with FASTA files, but rather with number matrices. If you create a symmetric distance matrix for all your sequences, those data points can be embedded into a low-dimensional space. This is to say that your general approach will work as far as producing some kind of a result, but I don't have a good feel how biologically relevant that embedding would be on a large scale.

On a small scale, it appears to work reasonably well. Below is an embedding of 3 groups of proteins (60-70 members each) that are related within a group, but not between each other.

enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6