A Call For Clear Information On How To Use F-Seq For Faire-Seq
2
9
Entering edit mode
11.5 years ago
KCC ★ 4.1k

I have been trying to figure out how to use F-seq for almost a year now. I can do the basics, but there more complicated aspects are lost on me. It's a bit aggravating as other than ZINBA, F-seq is the only other peak caller specifically designed for FAIRE-seq data. It seems that many people on Biostars know some aspects and I'd like to get it all in one place.

First, the F-seq site is located here: http://fureylab.web.unc.edu/software/fseq/

Let me start with what the basic command looks like:

fseq -of bed -f 0 -l 600 -t 6.0 chr1.bed chr2.bed chr3.bed

Each chromosome is listed separately.

-of gives the format -f fragment extension length -l feature length -t threshold, this controls the stringency of the peak calling

  1. One needs a script to split the reads into chromosomes, which is not included.

  2. One needs a way to covert the input into a fixed wiggle file. Then one needs a script to split the wiggle file into chromosomes, not included.

  3. Use iffBuilder, available on the F-seq webpage, like this iffBuilder chr1.wig. This produces a file called chr1.iff

It seems that if you put all the iff files in a directory like "Input" and use the new command:

 fseq -of bed -f 0 -l 600 -d Input -t 6.0 chr1.bed chr2.bed chr3.bed

Then the program accesses the input files. It doesn't give much indication that it's using input and I don't know how to check the files are being accessed. I tried making some malformed input files to see if the program would crash and it did. So, I assume the files are being accessed. If someone has a Mac OS X or Unix command line technique for checking which files are being accessed, let me know.

The next portion involves using mappability. I have seen the post Can you create your own bff files for F-seq? on Biostars.org saying there is a second program called bffBuilder which isn't publicly available.

It's also not clear what format the mappability files should be. It seems that what we need is a fixed wiggle file with mappability scores split by chromosomes. This is pure speculation, but it sounds like the program bffBuilder would work like this: bffBuilder chr1.wig From analogy with iffBuilder, I assume it makes a file called chr1.bff

If anybody has any more information or any association with Furey lab and can make bffBuilder available, please help flesh out everything necessary to use F-seq as a tool in FAIRE-seq.

• 6.5k views
ADD COMMENT
0
Entering edit mode

I can help out regarding the open files under UNIX:

First obtain the PID of the process, e.g.

ps -C fseq -o pid=

This will return the process id {PID}

Then check in the /proc directory for the file descriptors used by this PID:

ls -l /proc/{PID}/fd

Which will show you a list of links to all open file handles for the process {PID}

HTH!

ADD REPLY
0
Entering edit mode

Thanks. That's very useful.

ADD REPLY
3
Entering edit mode
11.1 years ago
aboyle ▴ 60

Happy to help - and please feel free to email us questions. I only just randomly found this post.

We've uploaded bffBuilder to Terry's website: http://fureylab.web.unc.edu/software/fseq/

spacemorrissey is correct that you do not need to split your sequence reads up before running f-seq (though with current read counts you might run out of memory without splitting up the chromosomes.

As for input file use, if there is no input or background files but you specified that they should be there then the program will show an error (or probably crash to be honest). These features were added after the published and were done rather hastily.

Finally, if you are running FAIRE data, you will not want to assign a fragment length (unless this is paired end data then you can assign it). A typical FAIRE run looks more like this:

fseq -of bed -t 6.0 -l 800 -v -b bff_dir/ -p iff_dir/ aligments.bed
ADD COMMENT
1
Entering edit mode
11.4 years ago

"One needs a script to split the reads into chromosomes, which is not included." -Actually you don't need to separate the bed files by chromosome, it will accept one file with all the chromosomes concatenated.

"One needs a way to covert the input into a fixed wiggle file. Then one needs a script to split the wiggle file into chromosomes, not included. Use iffBuilder, available on the fseq webpage, like this iffBuilder chr1.wig. This produces a file called chr1.iff"

-Be careful with this. The input that iffBuilder requires is not just a wig file of your input track. You actually need to generate a file where the average signal from the input over a 10kb window is set to 1 most of the time. That means you need to know what the average sequencing depth of your input is, and set signal at that level equal to 1. The iff tracks are supposed to identify places where your sample may have a different copy number of a region than the reference genome. Once correctly generated these files will affect your results.

ADD COMMENT
0
Entering edit mode

Hello, I would like to come back to .iff files generation. What is not clear, is how to generate the .wig files needed. Ok, you slide a 10kb window across your input, but is it an overlapping window or not? If not overlapping, all the 10kb bases receive the same score ? If overlapping, in the .wig file, which base receive the score? The first one ? Someone said me that the middle base receive the score, could you confirm ? This means that the score is calculated within a decreasing window size for the 5000 firsts and lasts bases of each chromosome ? That's also means that you do not want to limit input mapping to 4 or less mapping for multireads ?

ADD REPLY

Login before adding your answer.

Traffic: 2105 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6