Question

How to map SNPs of C. Albicans using custom data?

0

Entering edit mode

6.8 years ago

nattzy94 ▴ 50

I have SNP data of the yeast C. Albicans and would like to make a map indicating the frequency of each SNP. Similar to a Manhattan Plot but instead of p-value on the y-axis, I would have frequency. I have tried using the mapsnp program on R but realised that it references the human genome on the UCSC browser.

Is there some way I can manipulate the program to reference a Candida genome browser instead. Or is there a more suitable program for the task I am doing?

R snp • 2.9k views

ADD COMMENT • link 6.7 years ago by nattzy94 ▴ 50

score 1 · Answer 1 · 2017-07-17

1

Entering edit mode

6.8 years ago

bernatgel ★ 3.4k

You can use karyoploteR to create a plot similar to a manhattan plot and using any genome.

To create such plot you should first create an empty plot following these instructions and giving it a GRanges or a BED file with the chromosome sizes of C. Albicans (you can get them from this table.

After that, you just need to plot the points representing your snps using the kpPoints function. You can follow the examples at the tutorial or at the rainfall plot example, although the second one is a bit more complex.

Don't forget to use plot.type=4 as in the rainfall plot example to give it a more traditional manhattan plot look and feel.

EDIT: Following the comments, I'm here is some code to create a basic plot.

karyoploteR needs chromosome information to work, so I'll assume the data is in a file with the following format:

Chr Pos Frequency
chr1A   15305   4
chr1A   168836  7
chr1A   515835  1
chr1A   515850  3
chr1A   837522  4
chr1A   842901  7

with the columns separated by tabs.

library(karyoploteR)

#Read the data from a file in the same directory called "SNPS.txt"
snps <- read.table(file = "SNPs.txt", sep = "\t", header=TRUE, stringsAsFactors = FALSE)

#Create a GRanges object with the chromosomes and length found at http://www.candidagenome.org/cache/C_albicans_SC5314_genomeSnapshot.html
calbicans.genome <- toGRanges(data.frame(chr=c("chr1A", "chr1B", "chr2A", "chr2B", "chr3A", "chr3B", "chr4A", "chr4B", "chr5A", "chr5B", "chr6A", "chr6B", "chr7A", "chr7B", "chrRA", "chrRB"),
                    start=rep(1, 16),
                    end=c(3188341, 3188396, 2231883, 2231750, 1799298, 1799271, 1603259, 1603311, 1190869, 1190991, 1033292, 1033212, 949580, 949611, 2286237, 2285697)))

#Create the plot
kp <- plotKaryotype(genome=calbicans.genome, plot.type=4, ideogram.plotter = NULL, labels.plotter = NULL)
kpAddCytobandsAsLine(kp)
kpAddChromosomeNames(kp, srt=45)

max.freq <- max(snps$Frequency)

kpAddLabels(kp, "SNP Frequency", srt=90, pos=3)
kpAxis(kp, ymin = 0, ymax=max.freq)
kpPoints(kp, chr=snps$Chr, x=snps$Pos, y=snps$Frequency, ymin=0, ymax=max.freq)

You can ajdust multiple additional parameters then the size of the points and their colors. the margins... You can find more information on how to do it in the documentation.

With more data points (in this case, random data) it would look like this

enter image description here

ADD COMMENT • link 5.1 years ago by bernatgel ★ 3.4k

0

Entering edit mode

Thanks for the really detailed reply! This really helped me. However, I don't quite get the instructions for the tutorial to create a simple plot. I don't quite understand the code and how to manipulate the code to manually add my data. How does the program acquire the data for each of the 23 data points?

Thanks so much! You've already been of great help!

ADD REPLY • link 6.8 years ago by nattzy94 ▴ 50

0

Entering edit mode

Just to clarify my question, I have no clue how to plot an ideogram by loading my own custom data of SNP frequencies.

ADD REPLY • link 6.8 years ago by nattzy94 ▴ 50

1

Entering edit mode

Hi nattzy94,

The data in the example is randomly created with x <- 1:23*10e6 and y <- rnorm(23, mean=0.5, sd=0.25). In your case you would probably read it from a file with the data about your snps (position and frequency). If you can paste here the first lines of your data file I can try to help you with that.

ADD REPLY • link 6.8 years ago by bernatgel ★ 3.4k

0

Entering edit mode

Thanks so much. I'm new to R so really appreciate the help!

Pos...........Frequency

151305.........4

168836.........7

515835.........1

515850.........3

837522.........4

842901.........7

ADD REPLY • link 6.8 years ago by nattzy94 ▴ 50

0

Entering edit mode

Realized I did not provide the chromosome information. Just assume it is all chr 1. Also, I do not need to plot chromosome features. Thanks for your time!

ADD REPLY • link 6.8 years ago by nattzy94 ▴ 50

0

Entering edit mode

Hi Bernatgel,

Have you had a chance to look at the data? I can modify the data if it is not suitable.

ADD REPLY • link 6.8 years ago by nattzy94 ▴ 50

0

Entering edit mode

I edited the original answer to include some code

ADD REPLY • link 6.8 years ago by bernatgel ★ 3.4k

0

Entering edit mode

ok. Thanks so much bernatgel!

ADD REPLY • link 6.8 years ago by nattzy94 ▴ 50

0

Entering edit mode

ok. Thanks so much bernatgel!

ADD REPLY • link 6.8 years ago by nattzy94 ▴ 50

score 0 · Answer 2 · 2017-07-26

0

Entering edit mode

6.8 years ago

nattzy94 ▴ 50

Hi Bernatgel,

Thanks for all the help so far. I was following your tutorial on how to plot P. Vivax genes and was attempting to do the same for C. Albicans. I followed the instructions but was unable to complete the final step of plotting the density. I am wondering if that is because of the way that the gff file that I have is formatted differently from the plasmodium one. The file is from "http://www.candidagenome.org/download/gff/C_albicans_SC5314/C_albicans_SC5314_A22_current_features.gff".

Would the different formatting of the seqnames change anything?

Thanks!

ADD COMMENT • link 6.8 years ago by nattzy94 ▴ 50

1

Entering edit mode

Managed to figure it out! Thanks :)

ADD REPLY • link 6.8 years ago by nattzy94 ▴ 50

score 0 · Answer 3 · 2017-07-27

0

Entering edit mode

6.8 years ago

nattzy94 ▴ 50

Hi, is there any way I can plot more than 2 data panels? I want to combine my rainfall plots with plots of gene regions.

And also, is there a way to overlap data? e.g. if I wanted to plot gene regions and mRNA regions on the same data panel but with different colours.

ADD COMMENT • link 6.8 years ago by nattzy94 ▴ 50

0

Entering edit mode

Hi @nattzy94,

Right now only two data panels are implemented: one above and one below the chromosome.

However all of them can be partitioned to plot more than one data "track" with the parameters r0 and r1 (you can find more information on the Data Postioning tutorial. With that you can position the data however you want, including overlapping data with different colors, etc...)

ADD REPLY • link 6.7 years ago by bernatgel ★ 3.4k

score 0 · Answer 4 · 2017-08-06

0

Entering edit mode

6.7 years ago

nattzy94 ▴ 50

Hi Bernatgel,

I tried using the getVariantsColors function but the console kept returning the error message: Error in getVariantsColors(snps$Chr) : could not find function "getVariantsColors".

It seems the function is not in the package?

ADD COMMENT • link 6.7 years ago by nattzy94 ▴ 50

0

Entering edit mode

Hi,

Are you using the development version (it needs the development version of Bioconductor). The rainfall plot is only available in the development version right now and will be in release with the next Bioconductor release ( ~ October 2017). To use it you should install Bioconductor devel (you have some instructions here https://www.bioconductor.org/developers/how-to/useDevel/ ). If you go to the karyoploteR tutorial you will see that the rainfall plot example has a badge warning that it¡s only available in Bioconductor devel.

Hope this helps.

ADD REPLY • link 6.7 years ago by bernatgel ★ 3.4k