Question

Plot of 2 replicates

0

Entering edit mode

5.5 years ago

dianalves17 • 0

I have the following dataset in which I have 2 replicates per group (e.g. group 1: E_13_5_midline; replicates group 1: E_13_5_midline_1; E_13_5_midline_2).

ensembl_gene_id E13_5_meninges_1 E13_5_meninges_2 E13_5_midline_1 E13_5_midline_2 E14_5_meninges_1 E14_5_midline_1 E14_5_midline_2 E15_5_meninges_1 E15_5_meninges_2 E15_5_midline_1
   <chr>                      <dbl>            <dbl>           <dbl>           <dbl>            <dbl>           <dbl>           <dbl>            <dbl>            <dbl>           <dbl>
 1 ENSMUSG0000000…            6342.            6238.           7440.           6905.            6076.           7237.           7085.            5846.            5789.           6509.
 2 ENSMUSG0000000…             771.             768.            406.            450.             665.            450.            418.             607.             602.            443.
 3 ENSMUSG0000000…           40981.           43835.          96853.          89887.           39372.         150312.         157692.           64253.           53234.         259484.
 4 ENSMUSG0000000…             311.             265.            389.            367.             279.            585.            536.             277.             278.            408.
 5 ENSMUSG0000000…            1364.            1378.           2128.           1648.            1199.           1652.           1793.            1332.            1140.           1688.
 6 ENSMUSG0000000…            1035.            1106.            321.            617.            1125.            428.            426.            1310.            1635.            553.
 7 ENSMUSG0000000…            4285.            3985.           5693.           5084.            3205.           4024.           3700.            3500.            3556.           3806.
 8 ENSMUSG0000000…             870.             866.            798.            864.             779.            815.            767.             911.             846.            876.
 9 ENSMUSG0000000…             918.             994.            660.            693.             921.            444.            614.             784.             745.            693.
10 ENSMUSG0000000…            1266.            1304.            176.            618.            1279.            159.            162.            1311.            1402.            269.`

And I'm am interested in plotting the mean of each replicate for some genes of interest. So what I have done was to rearrange the dataset using the function 'gather', where I created a new column that contains all the replicates and another one with the corresponding value of the normalised counts.

 ensembl_gene_id    external_gene_name description                                                                         chromosome_name start_position end_position condition      counts
   <chr>              <chr>              <chr>                                                                               <chr>                    <int>        <int> <chr>           <dbl>
 1 ENSMUSG00000000001 Gnai3              guanine nucleotide binding protein (G protein), alpha inhibiting 3 [Source:MGI Sym… 3                    108107280    108146146 E13_5_meninge…  6342.
 2 ENSMUSG00000000028 Cdc45              cell division cycle 45 [Source:MGI Symbol;Acc:MGI:1338073]                          16                    18780447     18811987 E13_5_meninge…   771.
 3 ENSMUSG00000000031 H19                H19, imprinted maternally expressed transcript [Source:MGI Symbol;Acc:MGI:95891]    7                    142575529    142578143 E13_5_meninge… 40981.
 4 ENSMUSG00000000037 Scml2              sex comb on midleg-like 2 (Drosophila) [Source:MGI Symbol;Acc:MGI:1340042]          X                    161117193    161258213 E13_5_meninge…   311.
 5 ENSMUSG00000000056 Narf               nuclear prelamin A recognition factor [Source:MGI Symbol;Acc:MGI:1914858]           11                   121237253    121255856 E13_5_meninge…  1364.
 6 ENSMUSG00000000058 Cav2               caveolin 2 [Source:MGI Symbol;Acc:MGI:107571]                                       6                     17281185     17289115 E13_5_meninge…  1035.
 7 ENSMUSG00000000078 Klf6               Kruppel-like factor 6 [Source:MGI Symbol;Acc:MGI:1346318]                           13                     5861482      5870394 E13_5_meninge…  4285.
 8 ENSMUSG00000000085 Scmh1              sex comb on midleg homolog 1 [Source:MGI Symbol;Acc:MGI:1352762]                    4                    120405281    120530186 E13_5_meninge…   870.
 9 ENSMUSG00000000088 Cox5a              cytochrome c oxidase subunit Va [Source:MGI Symbol;Acc:MGI:88474]                   9                     57521274     57532426 E13_5_meninge…   918.
10 ENSMUSG00000000093 Tbx2               T-box 2 [Source:MGI Symbol;Acc:MGI:98494]                                           11                    85832551     85841948 E13_5_meninge…  1266.

The problem is, when I plot the replicates they appear as individual samples and not as replicates. Anyone can help on how I should grou the data to achieve what I want?

RNA-Seq R • 1.9k views

ADD COMMENT • link updated 5.5 years ago by Russ ▴ 500 • written 5.5 years ago by dianalves17 • 0

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

ADD REPLY • link 5.5 years ago by Ram 43k

0

Entering edit mode

Thank you. I have used the formatting bar but for some reason it didn't work. I'm sorry for the trouble.

ADD REPLY • link 5.5 years ago by dianalves17 • 0

0

Entering edit mode

It's OK - to format a block of text, select all lines before using the formatting bar. If you hit it without selecting text, it does inline code formatting, which doesn't translate well to how you wish the data to be.

ADD REPLY • link 5.5 years ago by Ram 43k

0

Entering edit mode

Thanks a lot for clarifying! :)

ADD REPLY • link 5.5 years ago by dianalves17 • 0

0

Entering edit mode

I'm not 100% clear on what you're looking for, what you've tried, and what packages you're using. If I assume that you want to create a bar plot using ggplot2, here's what I would do:

Separate your "condition" column into two columns (using separate from the tidyr package) into something like c("condition", "replicate"). Something like this will should then work:

ggplot(df, aes(x = condition, y = counts, fill = replicate)) + 
   geom_col(position = "dodge") + 
   facet_wrap(~esnembl_gen_id)

ADD REPLY • link 5.5 years ago by Russ ▴ 500

0

Entering edit mode

I probably miss expressed myself. What I want is to create a new column that would contain each group on my analysis and a second column that contains the replicate inside each group.

ADD REPLY • link 5.5 years ago by dianalves17 • 0

0

Entering edit mode

Oh - sorry, I misunderstood. Check out the separate command in tidyr, it's specifically designed to do what you're looking for (split one column into 2). You want to separate the data based on the third underscore in your condition name, correct? My regex isn't good enough to identify the third occurrence of a character in a string; my hack-y workaround is to simply replace the first two underscores with a different separator using the sub function.

e.g.

df$condition <- sub("_", "-", df$condition)
df$condition <- sub("_", "-", df$condition)

newdf <- separate(df, condition, c("group", "replicate"), "_")

ADD REPLY • link 5.5 years ago by Russ ▴ 500

0

Entering edit mode

Can you share the code (including the gather command) and plot that you generated and that you want to optimize?

ADD REPLY • link 5.5 years ago by Friederike 8.9k

0

Entering edit mode

    ![size_normalized_counts = read_tsv("sizefac_normalized_counts_by_replicate.txt")
geneinfo = read_tsv("geneInfo.txt")

size_normalized_counts_ext_gene_name = left_join(size_normalized_counts, geneinfo)

size_normalized_Counts_by_condition = gather(size_normalized_counts_ext_gene_name, 'E13_5_meninges_1', 'E13_5_meninges_2', 'E13_5_midline_1', 'E13_5_midline_2', 'E14_5_meninges_1', 'E14_5_midline_1', 'E14_5_midline_2', 'E15_5_meninges_1', 'E15_5_meninges_2', 'E15_5_midline_1', 'E15_5_midline_2', key = "condition", value = "counts")

size_normalized_counts_midline = filter(size_normalized_Counts_by_condition, condition == "E13_5_midline_1" | condition == "E13_5_midline_2"| condition == "E14_5_midline_1" | condition == "E14_5_midline_2" | condition == "E15_5_midline_1" | condition == "E15_5_midline_2")



ggplot(subset(size_normalized_counts_midline, external_gene_name %in% c("Cdh11")), aes(x = condition, y = counts, color = condition , group = condition)) +
  geom_point() +
  stat_summary(aes(y = counts, group=1), fun.y=mean, colour="red", geom="line",group=1)]

The graph that I want to optimize is bellow https://ibb.co/hTTkvf

And the type of graph I would like to have is something like this one https://ibb.co/fCrVvf

ADD REPLY • link 5.5 years ago by dianalves17 • 0

score 0 · Answer 1 · 2018-10-16

So this works for me:

size_normalized_Counts_by_condition1$condition <- sub("_", "-", size_normalized_Counts_by_condition$condition)
size_normalized_Counts_by_condition1$condition <- sub("_", "-", size_normalized_Counts_by_condition1$condition)

size_normalized_counts_by_condition_sep <- separate(size_normalized_Counts_by_condition1, condition, c("condition", "replicate"), "_")

ggplot(subset(size_normalized_counts_by_condition_sep, external_gene_name == "Gnai3"), aes(x = condition, y = counts, color = replicate)) + 
   geom_point() + 
   stat_summary(aes(group = 1), fun.y = mean, geom = "point", shape = 4)

Result (I was a bit sloppy importing the data, I think explaining the missing data for E14 meninges and E15 midline) :

enter image description here