How to use FIMO to find a motif among sequences?
1
0
Entering edit mode
4.0 years ago
Kai_Qi ▴ 130

Hi:

I just got a fasta file converted from bedfiles. It contains sequence of some mRNAs. I want to find a sequence: CAGGUGAG through the FASTA file.

I have read the mamual of FIMO but it still looked intimidating to me. So I just type the sequence above into a txt file as an input to run but it failed.

fimo U1sequence.txt IR_E18_more_than_E11.fasta

Can anyone tell me how to do it?

RNA-Seq rna-seq software error sequencing • 1.5k views
ADD COMMENT
1
Entering edit mode
4.0 years ago

I have a post on Bioinformatics SE which demonstrates how to do a FIMO search:

https://bioinformatics.stackexchange.com/a/2491/776

You need a MEME-formatted table that includes a PWM for your pattern of interest.

This is like looking for a "consensus sequence" as opposed to an exact match for a sequence-of-interest.

It's not clear that a MEME table is what you are starting with.

If you're instead looking for the position of exact matches to your sequence against a target FASTA file, here's a different approach for finding or querying short DNA kmers in a target FASTA file:

https://bit.ly/2zQ7Fww

This links to a Biostars post; I'd post the link directly but the link will get reformatted in an undesirable way.

In your case, you start with an RNA kmer, so you would replace U with T:

$ echo -e "CAGGUGAG" | tr 'U' 'T'

You could then search against this DNA-alphabetted string and its reverse complement, as described in the link.

ADD COMMENT
0
Entering edit mode

Hi Alex: Thank you very much for the information:

My situation is like this: I have analyzed a RNAseq data for alternative splicing and got the coordinates of these sequences and transformed them into fasta files. From literatures I know a sequence (specifically here it is an U1 recognition sequence) might be a feature in those exons . So, I want to see the Densities of predicted strong U1-recognition sites in exons of mRNA I sequenced. The motif I got is from text book and I typed it in a text file.

ADD REPLY
0
Entering edit mode

Maybe use MEME on your RNAseq-derived sequences to predict motifs, comparing them against your consensus sequence, in order to get a closest match. With your selected PWM in hand, you could use FIMO with that pattern, to search for it within other target FASTA sequences and calculate your densities.

Alternatively, if your consensus sequence only requires an exact or partial match, it is short enough that you could perhaps use grep to do that kind of search, to find offsets to matches within a target FASTA file.

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6