how to calculate the frequency of each overlapping nucleotides?
Entering edit mode
7.2 years ago

Hi I want to calculate the frequency of each overlapping nucleotide. I have coordinates in bed format .e.g


chr1 20 40

chr2 6 20


chr1 20 100

chr1 10 70

chr1 20 25

chr2 15 50

chr3 5 12

chr6 5 20

I have used these command like bedops, bedtool intersect, bedmap, coverageBed or bedtools genomecov. But they are giving me output as a complete interval overlap with scores. I just want to calculate the occurrence of each nucleotide individually? Any suggestions!

Thanks for your consideration

next-gen • 2.1k views
Entering edit mode
7.2 years ago

You can use getFasta to generate fasta file for your reference file and can then query using the second file, what is the nucleotide frequency. Use nucBed for that, no official documentation yet but from command line help, you can get the hints.

Tool:    bedtools nuc (aka nucBed)
Version: v2.16.2
Summary: Profiles the nucleotide content of intervals in a fasta file.

Usage:   bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

    -fi Input FASTA file

    -bed    BED/GFF/VCF file of ranges to extract from -fi

    -s  Profile the sequence according to strand.

    -seq    Print the extracted sequence

    -pattern    Report the number of times a user-defined sequence
            is observed (case-sensitive).

    -C  Igore case when matching -pattern. By defaulty, case matters.

Output format: 
    The following information will be reported after each BED entry:
        1) %AT content
        2) %GC content
        3) Number of As observed
        4) Number of Cs observed
        5) Number of Gs observed
        6) Number of Ts observed
        7) Number of Ns observed
        8) Number of other bases observed
        9) The length of the explored sequence/interval.
        10) The seq. extracted from the FASTA file. (opt., if -seq is used)
        11) The number of times a user's pattern was observed.
            (opt., if -pattern is used.)
Entering edit mode

That's was very informative. I really appreciate your time and effort. Thanks a lot.

Entering edit mode

Actually I want to compare two files which are under same assembly(hg19). Each file contains coordinates and I am interested in overlapping coordinates one by one. Like how many times each coordinates occurred/overlapped in reference file?

$ reference file #hyphen indicates genomic locations

chr1 --------- --------------- ------- ----------- ------------- -------------------

$ map file

chr1 ------------- -------

chr1 ------------------ -------------------------

chr1 -----------------

chr1 -------------------

chr1 --------------------------------

$ ExpectedResult.bed

chr coordinates occurrences

chr1 20 5

chr1 21 4

chr1 22 0

chr1 23 2

chr1 24 1

chr1 25 1

chr1 26 1

chr1 27 6

chr1 28 3

chr1 29 7

chr1 30 0

means that genomic location 20 of chromosome1 in map file is overlapping five times with reference file. this is the problem...Any suggestions


Entering edit mode

Login before adding your answer.

Traffic: 2528 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6