Question: FIMO output that preserves padj values
0
gravatar for rbronste
7 months ago by
rbronste240
rbronste240 wrote:

Hi I am generating a fasta file from a DESeq2 output in the following way:

bedtools getfasta -fi /son/genomes/mm10.fa -bed treatment_pw_padj.001_sorted.bed -fo treatment_pw_padj.001_sorted.fa

The above bed file has FDR information as a last column, I believe #12

I then run the following to search for instances of a specific motifs within these intervals:

fimo /son/MA0112.1.meme /son/treatment_pw_padj.001_sorted.fa

I get a list of located motif sites with p-values however I would like to preserve the DESeq2 FDR value located in the original bed file as well next to these newly filtered intervals, any ideas on how I could do that? Thank you.

motifs chip-seq deseq2 meme fimo • 295 views
ADD COMMENTlink written 7 months ago by rbronste240
2

Why don't you make a custom fasta header for every entry that contains the column information like:

>name|chr|start|end|padj|(...)
AGTAGTCA(...)
ADD REPLYlink written 7 months ago by ATpoint15k

I think your method is much better.

ADD REPLYlink modified 7 months ago • written 7 months ago by morovatunc400
1

I like to keep things silmpe whenever I can ;-D

ADD REPLYlink written 7 months ago by ATpoint15k

What would be your preferred method of making the custom fasta header and is this something I can output from bedtools getfasta, so it maintains a column from the initial bed file Im making into a fasta? Thanks!

ADD REPLYlink written 7 months ago by rbronste240

I dont have your data at the moment therefore I cannot give explicit solution but I can show you the way

R has a function called merge.data.frame. We are gonna use this to merge your initial data with the latest fimo output. So In the end you will have BED + fimo merged together. (probably, you will have duplicated rows of that BED since fimo might find multiple motifs in the single binding region)

1) So use bedtools getfasta with -name option on it. Then run fimo.

2) In fimo's txt output, there should be sequence name column. Extract your that peak name as they will become identical as write-in your input bed.

3) then use merge.data.frame function(in R base) to merge those two and get a single data frame.

4) make proper filtration to remove unnecessary data. (well you can make this step after reading your files as well)

If you can send me a sample, I can code it as well.

ADD REPLYlink modified 7 months ago • written 7 months ago by morovatunc400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2203 users visited in the last hour