Question: How to map SNPs of C. Albicans using custom data?
0
gravatar for nattzy94
10 weeks ago by
nattzy9410
nattzy9410 wrote:

I have SNP data of the yeast C. Albicans and would like to make a map indicating the frequency of each SNP. Similar to a Manhattan Plot but instead of p-value on the y-axis, I would have frequency. I have tried using the mapsnp program on R but realised that it references the human genome on the UCSC browser.

Is there some way I can manipulate the program to reference a Candida genome browser instead. Or is there a more suitable program for the task I am doing?

snp R • 339 views
ADD COMMENTlink modified 7 weeks ago • written 10 weeks ago by nattzy9410
1
gravatar for bernatgel
9 weeks ago by
bernatgel770
Barcelona, Spain
bernatgel770 wrote:

You can use karyoploteR to create a plot similar to a manhattan plot and using any genome.

To create such plot you should first create an empty plot following these instructions and giving it a GRanges or a BED file with the chromosome sizes of C. Albicans (you can get them from this table.

After that, you just need to plot the points representing your snps using the kpPoints function. You can follow the examples at the tutorial or at the rainfall plot example, although the second one is a bit more complex.

Don't forget to use plot.type=4 as in the rainfall plot example to give it a more traditional manhattan plot look and feel.

EDIT: Following the comments, I'm here is some code to create a basic plot.

karyoploteR needs chromosome information to work, so I'll assume the data is in a file with the following format:

Chr Pos Frequency
chr1A   15305   4
chr1A   168836  7
chr1A   515835  1
chr1A   515850  3
chr1A   837522  4
chr1A   842901  7

with the columns separated by tabs.

library(karyoploteR)

#Read the data from a file in the same directory called "SNPS.txt"
snps <- read.table(file = "SNPs.txt", sep = "\t", header=TRUE, stringsAsFactors = FALSE)

#Create a GRanges object with the chromosomes and length found at http://www.candidagenome.org/cache/C_albicans_SC5314_genomeSnapshot.html
calbicans.genome <- toGRanges(data.frame(chr=c("chr1A", "chr1B", "chr2A", "chr2B", "chr3A", "chr3B", "chr4A", "chr4B", "chr5A", "chr5B", "chr6A", "chr6B", "chr7A", "chr7B", "chrRA", "chrRB"),
                    start=rep(1, 16),
                    end=c(3188341, 3188396, 2231883, 2231750, 1799298, 1799271, 1603259, 1603311, 1190869, 1190991, 1033292, 1033212, 949580, 949611, 2286237, 2285697)))

#Create the plot
kp <- plotKaryotype(genome=calbicans.genome, plot.type=4, ideogram.plotter = NULL, labels.plotter = NULL)
kpAddCytobandsAsLine(kp)
kpAddChromosomeNames(kp, srt=45)

max.freq <- max(snps$Frequency)

kpAddLabels(kp, "SNP Frequency", srt=90, pos=3)
kpAxis(kp, ymin = 0, ymax=max.freq)
kpPoints(kp, chr=snps$Chr, x=snps$Pos, y=snps$Frequency, ymin=0, ymax=max.freq)

You can ajdust multiple additional parameters then the size of the points and their colors. the margins... You can find more information on how to do it in the documentation.

With more data points (in this case, random data) it would look like this

enter image description here

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by bernatgel770

Thanks for the really detailed reply! This really helped me. However, I don't quite get the instructions for the tutorial to create a simple plot. I don't quite understand the code and how to manipulate the code to manually add my data. How does the program acquire the data for each of the 23 data points?

Thanks so much! You've already been of great help!

ADD REPLYlink written 9 weeks ago by nattzy9410

Just to clarify my question, I have no clue how to plot an ideogram by loading my own custom data of SNP frequencies.

ADD REPLYlink written 9 weeks ago by nattzy9410
1

Hi nattzy94,

The data in the example is randomly created with x <- 1:23*10e6 and y <- rnorm(23, mean=0.5, sd=0.25). In your case you would probably read it from a file with the data about your snps (position and frequency). If you can paste here the first lines of your data file I can try to help you with that.

ADD REPLYlink written 9 weeks ago by bernatgel770

Thanks so much. I'm new to R so really appreciate the help!

Pos...........Frequency

151305.........4

168836.........7

515835.........1

515850.........3

837522.........4

842901.........7

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by nattzy9410

Realized I did not provide the chromosome information. Just assume it is all chr 1. Also, I do not need to plot chromosome features. Thanks for your time!

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by nattzy9410

Hi Bernatgel,

Have you had a chance to look at the data? I can modify the data if it is not suitable.

ADD REPLYlink written 9 weeks ago by nattzy9410

I edited the original answer to include some code

ADD REPLYlink written 9 weeks ago by bernatgel770

ok. Thanks so much bernatgel!

ADD REPLYlink written 9 weeks ago by nattzy9410

ok. Thanks so much bernatgel!

ADD REPLYlink written 9 weeks ago by nattzy9410
0
gravatar for nattzy94
8 weeks ago by
nattzy9410
nattzy9410 wrote:

Hi Bernatgel,

Thanks for all the help so far. I was following your tutorial on how to plot P. Vivax genes and was attempting to do the same for C. Albicans. I followed the instructions but was unable to complete the final step of plotting the density. I am wondering if that is because of the way that the gff file that I have is formatted differently from the plasmodium one. The file is from "http://www.candidagenome.org/download/gff/C_albicans_SC5314/C_albicans_SC5314_A22_current_features.gff".

Would the different formatting of the seqnames change anything?

Thanks!

ADD COMMENTlink written 8 weeks ago by nattzy9410
1

Managed to figure it out! Thanks :)

ADD REPLYlink written 8 weeks ago by nattzy9410
0
gravatar for nattzy94
8 weeks ago by
nattzy9410
nattzy9410 wrote:

Hi, is there any way I can plot more than 2 data panels? I want to combine my rainfall plots with plots of gene regions.

And also, is there a way to overlap data? e.g. if I wanted to plot gene regions and mRNA regions on the same data panel but with different colours.

ADD COMMENTlink written 8 weeks ago by nattzy9410

Hi @nattzy94,

Right now only two data panels are implemented: one above and one below the chromosome.

However all of them can be partitioned to plot more than one data "track" with the parameters r0 and r1 (you can find more information on the Data Postioning tutorial. With that you can position the data however you want, including overlapping data with different colors, etc...)

ADD REPLYlink written 7 weeks ago by bernatgel770
0
gravatar for nattzy94
7 weeks ago by
nattzy9410
nattzy9410 wrote:

Hi Bernatgel,

I tried using the getVariantsColors function but the console kept returning the error message: Error in getVariantsColors(snps$Chr) : could not find function "getVariantsColors".

It seems the function is not in the package?

ADD COMMENTlink written 7 weeks ago by nattzy9410

Hi,

Are you using the development version (it needs the development version of Bioconductor). The rainfall plot is only available in the development version right now and will be in release with the next Bioconductor release ( ~ October 2017). To use it you should install Bioconductor devel (you have some instructions here https://www.bioconductor.org/developers/how-to/useDevel/ ). If you go to the karyoploteR tutorial you will see that the rainfall plot example has a badge warning that it¡s only available in Bioconductor devel.

Hope this helps.

ADD REPLYlink written 6 weeks ago by bernatgel770
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1195 users visited in the last hour