Question: How to map SNPs of C. Albicans using custom data?
0
gravatar for nattzy94
11 days ago by
nattzy9410
nattzy9410 wrote:

I have SNP data of the yeast C. Albicans and would like to make a map indicating the frequency of each SNP. Similar to a Manhattan Plot but instead of p-value on the y-axis, I would have frequency. I have tried using the mapsnp program on R but realised that it references the human genome on the UCSC browser.

Is there some way I can manipulate the program to reference a Candida genome browser instead. Or is there a more suitable program for the task I am doing?

snp R • 202 views
ADD COMMENTlink modified 15 hours ago • written 11 days ago by nattzy9410
1
gravatar for bernatgel
11 days ago by
bernatgel710
Barcelona, Spain
bernatgel710 wrote:

You can use karyoploteR to create a plot similar to a manhattan plot and using any genome.

To create such plot you should first create an empty plot following these instructions and giving it a GRanges or a BED file with the chromosome sizes of C. Albicans (you can get them from this table.

After that, you just need to plot the points representing your snps using the kpPoints function. You can follow the examples at the tutorial or at the rainfall plot example, although the second one is a bit more complex.

Don't forget to use plot.type=4 as in the rainfall plot example to give it a more traditional manhattan plot look and feel.

EDIT: Following the comments, I'm here is some code to create a basic plot.

karyoploteR needs chromosome information to work, so I'll assume the data is in a file with the following format:

Chr Pos Frequency
chr1A   15305   4
chr1A   168836  7
chr1A   515835  1
chr1A   515850  3
chr1A   837522  4
chr1A   842901  7

with the columns separated by tabs.

library(karyoploteR)

#Read the data from a file in the same directory called "SNPS.txt"
snps <- read.table(file = "SNPs.txt", sep = "\t", header=TRUE, stringsAsFactors = FALSE)

#Create a GRanges object with the chromosomes and length found at http://www.candidagenome.org/cache/C_albicans_SC5314_genomeSnapshot.html
calbicans.genome <- toGRanges(data.frame(chr=c("chr1A", "chr1B", "chr2A", "chr2B", "chr3A", "chr3B", "chr4A", "chr4B", "chr5A", "chr5B", "chr6A", "chr6B", "chr7A", "chr7B", "chrRA", "chrRB"),
                    start=rep(1, 16),
                    end=c(3188341, 3188396, 2231883, 2231750, 1799298, 1799271, 1603259, 1603311, 1190869, 1190991, 1033292, 1033212, 949580, 949611, 2286237, 2285697)))

#Create the plot
kp <- plotKaryotype(genome=calbicans.genome, plot.type=4, ideogram.plotter = NULL, labels.plotter = NULL)
kpAddCytobandsAsLine(kp)
kpAddChromosomeNames(kp, srt=45)

max.freq <- max(snps$Frequency)

kpAddLabels(kp, "SNP Frequency", srt=90, pos=3)
kpAxis(kp, ymin = 0, ymax=max.freq)
kpPoints(kp, chr=snps$Chr, x=snps$Pos, y=snps$Frequency, ymin=0, ymax=max.freq)

You can ajdust multiple additional parameters then the size of the points and their colors. the margins... You can find more information on how to do it in the documentation.

With more data points (in this case, random data) it would look like this

enter image description here

ADD COMMENTlink modified 8 days ago • written 11 days ago by bernatgel710

Thanks for the really detailed reply! This really helped me. However, I don't quite get the instructions for the tutorial to create a simple plot. I don't quite understand the code and how to manipulate the code to manually add my data. How does the program acquire the data for each of the 23 data points?

Thanks so much! You've already been of great help!

ADD REPLYlink written 10 days ago by nattzy9410

Just to clarify my question, I have no clue how to plot an ideogram by loading my own custom data of SNP frequencies.

ADD REPLYlink written 10 days ago by nattzy9410
1

Hi nattzy94,

The data in the example is randomly created with x <- 1:23*10e6 and y <- rnorm(23, mean=0.5, sd=0.25). In your case you would probably read it from a file with the data about your snps (position and frequency). If you can paste here the first lines of your data file I can try to help you with that.

ADD REPLYlink written 10 days ago by bernatgel710

Thanks so much. I'm new to R so really appreciate the help!

Pos...........Frequency

151305.........4

168836.........7

515835.........1

515850.........3

837522.........4

842901.........7

ADD REPLYlink modified 9 days ago • written 9 days ago by nattzy9410

Realized I did not provide the chromosome information. Just assume it is all chr 1. Also, I do not need to plot chromosome features. Thanks for your time!

ADD REPLYlink modified 9 days ago • written 9 days ago by nattzy9410

Hi Bernatgel,

Have you had a chance to look at the data? I can modify the data if it is not suitable.

ADD REPLYlink written 8 days ago by nattzy9410

I edited the original answer to include some code

ADD REPLYlink written 8 days ago by bernatgel710

ok. Thanks so much bernatgel!

ADD REPLYlink written 8 days ago by nattzy9410

ok. Thanks so much bernatgel!

ADD REPLYlink written 8 days ago by nattzy9410
0
gravatar for nattzy94
1 day ago by
nattzy9410
nattzy9410 wrote:

Hi Bernatgel,

Thanks for all the help so far. I was following your tutorial on how to plot P. Vivax genes and was attempting to do the same for C. Albicans. I followed the instructions but was unable to complete the final step of plotting the density. I am wondering if that is because of the way that the gff file that I have is formatted differently from the plasmodium one. The file is from "http://www.candidagenome.org/download/gff/C_albicans_SC5314/C_albicans_SC5314_A22_current_features.gff".

Would the different formatting of the seqnames change anything?

Thanks!

ADD COMMENTlink written 1 day ago by nattzy9410
1

Managed to figure it out! Thanks :)

ADD REPLYlink written 1 day ago by nattzy9410
0
gravatar for nattzy94
15 hours ago by
nattzy9410
nattzy9410 wrote:

Hi, is there any way I can plot more than 2 data panels? I want to combine my rainfall plots with plots of gene regions.

And also, is there a way to overlap data? e.g. if I wanted to plot gene regions and mRNA regions on the same data panel but with different colours.

ADD COMMENTlink written 15 hours ago by nattzy9410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1166 users visited in the last hour