R programming scripting linux bioinformatics
2
0
Entering edit mode
15 months ago

I have a dataset that looks like this

Date    Location    count
18-01-2023  A   45.457
18-01-2023  B   undetermined
24-01-2023  C   undetermined
24-01-2023  B   undetermined
24-01-2023  C   20.33
27-01-2023  A   undetermined
27-01-2023  D   undetermined
27-01-2023  B   30.27
01-02-2023  C   undetermined
01-02-2023  A   undetermined
01-02-2023  D   29.22
01-02-2023  B   undetermined
03-02-2023  A   undetermined
03-02-2023  C   27.822

this is about getting some bacteria counts from different samples collected from a different locations.

I want to create a plot like this enter image description here

which R script will work if I have multiple entries in one date.

I have tried this command

set.seed(1023)                          

sample_data <- round(exampledata)

# Load packages reshape2 and ggplot2
library("reshape2") 
library("ggplot2")  

# Convert sample_data from wide form to long form
data_final <- melt(all_data, id.vars = "Date")
head(data_final)

# Plot the final data
ggplot(data_final,                           
       aes(x = date(),
           y = value,
           col = variable)) + geom_line()
Linux shell-scripting R • 704 views
ADD COMMENT
2
Entering edit mode
15 months ago

It would be something like this.

library("lubridate")
library("dplyr")
library("ggplot")

df |>
  dplyr::mutate(
    Date=dmy(Date),
    count=as.numeric(count)) |>
  ggplot(aes(x=Date, y=count, color=Location, group=Location)) +
    geom_line() +
    geom_point()

If your R versions is < 4.1 replace |> with the magrittr pipe %>%.

ADD COMMENT
1
Entering edit mode
15 months ago

There are unfortunately multiple issues with your code, which indicate that you are missing some important fundamentals. Therefore, I recommend that you familiarize yourself with those two tutorials related to the Tidyverse: Basic Data Management and then ggplot2.

Some hints regarding your code:

  • set.seed() is not needed here.
  • round(exampledata) will only work for matrices, but not for mixed data.frames with categorical columns.
  • the delineation of data_final/all_data and sample_data/ exampledata is unclear.
  • In the example you show above, your columns are named Date, Location and count. However, you are trying to plot date, variable and value.
  • Mind that date() is a function and not a variable name.
  • The presence of the word undetermined in the count column indicates that it is probably not numeric, but of character class and thus can't be plotted on continuous scales.
ADD COMMENT

Login before adding your answer.

Traffic: 3131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6