Question: how to calculate the frequency of each overlapping nucleotides?
0
gravatar for shahbazmunir11
3.0 years ago by
shahbazmunir1110 wrote:

Hi I want to calculate the frequency of each overlapping nucleotide. I have coordinates in bed format .e.g

map.bed

chr1 20 40

chr2 6 20

reference.bed

chr1 20 100

chr1 10 70

chr1 20 25

chr2 15 50

chr3 5 12

chr6 5 20

I have used these command like bedops, bedtool intersect, bedmap, coverageBed or bedtools genomecov. But they are giving me output as a complete interval overlap with scores. I just want to calculate the occurrence of each nucleotide individually? Any suggestions!

Thanks for your consideration

next-gen • 1.0k views
ADD COMMENTlink modified 3.0 years ago by Sukhdeep Singh9.6k • written 3.0 years ago by shahbazmunir1110
3
gravatar for Sukhdeep Singh
3.0 years ago by
Sukhdeep Singh9.6k
Netherlands
Sukhdeep Singh9.6k wrote:

You can use getFasta to generate fasta file for your reference file and can then query using the second file, what is the nucleotide frequency. Use nucBed for that, no official documentation yet but from command line help, you can get the hints.

Tool:    bedtools nuc (aka nucBed)
Version: v2.16.2
Summary: Profiles the nucleotide content of intervals in a fasta file.

Usage:   bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

Options: 
    -fi Input FASTA file

    -bed    BED/GFF/VCF file of ranges to extract from -fi

    -s  Profile the sequence according to strand.

    -seq    Print the extracted sequence

    -pattern    Report the number of times a user-defined sequence
            is observed (case-sensitive).

    -C  Igore case when matching -pattern. By defaulty, case matters.

Output format: 
    The following information will be reported after each BED entry:
        1) %AT content
        2) %GC content
        3) Number of As observed
        4) Number of Cs observed
        5) Number of Gs observed
        6) Number of Ts observed
        7) Number of Ns observed
        8) Number of other bases observed
        9) The length of the explored sequence/interval.
        10) The seq. extracted from the FASTA file. (opt., if -seq is used)
        11) The number of times a user's pattern was observed.
            (opt., if -pattern is used.)
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Sukhdeep Singh9.6k

That's was very informative. I really appreciate your time and effort. Thanks a lot.

ADD REPLYlink written 3.0 years ago by shahbazmunir1110

Actually I want to compare two files which are under same assembly(hg19). Each file contains coordinates and I am interested in overlapping coordinates one by one. Like how many times each coordinates occurred/overlapped in reference file?

$ reference file #hyphen indicates genomic locations

chr1 --------- --------------- ------- ----------- ------------- -------------------

$ map file

chr1 ------------- -------

chr1 ------------------ -------------------------

chr1 -----------------

chr1 -------------------

chr1 --------------------------------

$ ExpectedResult.bed

chr coordinates occurrences

chr1 20 5

chr1 21 4

chr1 22 0

chr1 23 2

chr1 24 1

chr1 25 1

chr1 26 1

chr1 27 6

chr1 28 3

chr1 29 7

chr1 30 0

means that genomic location 20 of chromosome1 in map file is overlapping five times with reference file. this is the problem...Any suggestions

thanks

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by shahbazmunir1110

intersectBed

ADD REPLYlink written 3.0 years ago by Sukhdeep Singh9.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1124 users visited in the last hour