Covert multiple short sequence into list of IUPAC motif
1
0
Entering edit mode
11 months ago
praasu ▴ 40

Hi, I have approximately 200 four nucleotide base sequences, I want to cluster them into various repeat/motif and get final IUPAC notation of them.

For example :

AAAC
AAAG
CCAA
CTCA
CTCC
TGGT
TTAG
TTCA

Let me know if anyone have some suggestion.

Thanks you in advance.

sequence • 502 views
ADD COMMENT
1
Entering edit mode

please, review all your previous questions and mark they answered if needed (green tick on the left). eg: Extracting Intron-Exon Reads from bam files , how to extract intron coordinates (in bed) from bam , etc...

ADD REPLY
1
Entering edit mode

You could do a multiple sequence alignment and then create a sequence logo from it. There is a web based version: https://weblogo.berkeley.edu/logo.cgi

ADD REPLY
1
Entering edit mode
11 months ago
Trivas ★ 1.7k

Look into the R package DiffLogo. You can convert your fasta file into a PWM which you can then use elsewhere. Looks like universalmotif might also have a bunch of packages relevant to you http://www.bioconductor.org/packages/devel/bioc/vignettes/universalmotif/inst/doc/MotifManipulation.pdf

ADD COMMENT

Login before adding your answer.

Traffic: 2658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6