Converting from BED to SAF/GFF
2
0
Entering edit mode
4.8 years ago
ccag ▴ 30

I called peaks using MACS2 and now would like to use my narrowPeak file to do featureCounts. As far as I can tell, featureCOunts only uses SAF or GFF formats. Does anyone know an easy way to convert between a bed and these file types for Dm3?

bed gff saf featureCount • 7.7k views
5
Entering edit mode
3.4 years ago
ATpoint 54k
awk 'OFS="\t" {print $1"."$2"."$3,$1, $2,$3, "."}' in.narrowPeak > out.saf


The first column is the identifier. This is then either any gene name or in case of genomic regions simply the concatenated genomic coordinates, e.g. separated by dot, such as chr1.100.1000000. For genomic regions I always set strand to ., for genes one can set the strand where the gene is located.

0
Entering edit mode

I am wondering whether \$2 should +1 because the narrowPeak is 0-based. But Actually I rarely find the description about SAF coordinate system :).

1
Entering edit mode

In 6.2.2. of https://bioconductor.org/packages/release/bioc/vignettes/Rsubread/inst/doc/SubreadUsersGuide.pdf it says that SAF format is inclusive for both start and end coordinate. I interpret this as "leave narrowPeak as it is". I noted that if you use e.g. bedtools makeWindows to make windows across the whole genome, and if you then transform to SAF and add 1 to start you will not have 100% reads assigned to that SAF but only 99.x% because that one start nucleotide is missing. WIthout modification to that BED you assign 100%. Not sure if this is a correct statement, I am always a bit confused with these different coordinate systems. I also never understood why the featureCounts developers could not simply use BED format instead of this SAF format.

0
Entering edit mode

0
Entering edit mode

This SAF is 1-based coordinate system, yes, it should add 1. I have checked the example gene listed in the featureCounts manual.

0
Entering edit mode
4.8 years ago
Jeffin Rockey ★ 1.2k

Option 1 :GenomeTools

Option 2: Go to galaxy oqtans page here and on the left pane there is "GFF Toolkit" which has BED_to_GFF3 converter .

Option 3: A long route. From here make use of bedToGenePred followed by genePredToGtf to get a gtf file at first. Then you can make use of the lot of tools that does a gtf to gff3 conversion.

I am mentioning multiple options because depending on whether you bed file is 6 column or 12 column or so, some may not be applicable.