Question

Heatmap from chromosome co-ordinates

0

Entering edit mode

8.9 years ago

caspase8mach ▴ 30

Hi,

1) How can I generate a heatmap from chromosome coordinates from several samples?

2) I would also like to run some statistics on the samples to find the probability of an event happening at a particular chromosome location amongst various samples.

The data is like this:

Sample   Chromosome                 Start     End    Value

A             1                     10000     10500   -0.5

A             1                     20000     20500   -1.5

A             1                     30000     30500   2.5

A             1                     40000     40500   -0.5

A             1                     50000     50500   0.5

B             1                     10000     10500   -0.5

B             1                     20000     20500   -1.5

B             1                     30000     30500   2.5

B             1                     40000     40500   -0.5

B             1                     50000     50500   0.5

Thanks a lot.

Heatmap Chromosome Coordinate Ideogram R • 3.2k views

ADD COMMENT • link updated 8.9 years ago by mforde84 ★ 1.4k • written 8.9 years ago by caspase8mach ▴ 30

0

Entering edit mode

Get your data in following format to plot a heatmap in R.

Chromosome:Start-End    Sample_A    Sample_B
1:10000-10500    -0.5    0.5
1:20000-20500    -1.5    -0.5
1:30000-30500    2.5    2.5
1:40000-40500    -0.5    -0.5
1:50000-50500    0.5    0.5

In R,

x <- read.table("data.txt", header=T, row.names=1, check.names=F)
heatmap(as.matrix(x))

Explore various options.

ADD REPLY • link 8.9 years ago by GouthamAtla 12k

score 1 · Answer 1 · 2016-08-26

1

Entering edit mode

8.9 years ago

mforde84 ★ 1.4k

Another option I use occasionally is the ChIPseeker package in R

ADD COMMENT • link 8.9 years ago by mforde84 ★ 1.4k

score 0 · Answer 2 · 2016-08-26

The way I would do it is by mapping alignments present in a sample for a given interval.

Split your current bed file by sample name, so that you have one file for sample A, B, and so on:

$ cat sampleA.bed
chr1 1000 1005
chr1 1010 1015
...

$ cat sampleB.bed
chr1 2000 2005
chr1 2010 2015
...

Remember no header in the bed files.

If they are the same intervals for each sample then just generate one bed file for all of the unique intervals, in your case do something like:

$ tail -n+2 data.txt | cut -f 2-4 | sort -u > intervals.bed

Then use samtools to pull out the alignments corresponding to the bed intervals:

samtools view -bLh {sampleA.bed or intervals.bed} sampleA.bam > sampleA.subsampled.sam

Then use seqMINER to plot the sam alignments against all unique intervals. So in seqMINER

Load reference coordinates (i.e., peaks) <- intervals.bed

Load aligned reads <- your subsampled sam files.

Then cluster the peaks however you want. Typically I'll use enrichment linear at 10 clusters.

score 0 · Answer 3 · 2016-08-26

0

Entering edit mode

8.9 years ago

mforde84 ★ 1.4k

Example

ADD COMMENT • link 8.9 years ago by mforde84 ★ 1.4k

score 0 · Answer 4 · 2016-08-26

For the statistically analyses you'll have to do either differential expression, or differential binding analysis. It's not as straight forward as using raw counts for a region. For example, if you are looking at differentially sized regions then you need to normalize raw counts by feature size and sequencing depth. Typically the best way to do this is by using transcripts per million (TPM), then log2 transforming the TPM values. Alternatively, you can use statistically approaches utilizing negative binomial GLM. A couple good packages for this include DeSEQ2, edgeR, and for ChIP sample I'd suggest MACS2 or Homer.