Question

Logfold changes and P value

0

Entering edit mode

21 months ago

KABILAN ▴ 40

I want to plot the volcano plot for the differential expression analysis of normalized proteomics data. The small portion of the normalized data is,

structure(list(`Fasta headers` = c(">sp|P00128|QCR7_YEAST Cytochrome b-c1 complex subunit 7 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=QCR7 PE=1 SV=2", 
">sp|P41277|GPP1_YEAST (DL)-glycerol-3-phosphatase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=RHR2 PE=1 SV=3", 
">sp|P32599|FIMB_YEAST Fimbrin OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=SAC6 PE=1 SV=1", 
">sp|P10080|SSBP1_YEAST Single-stranded nucleic acid-binding protein OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=SBP1 PE=1 SV=2", 
">sp|P08417|FUMH_YEAST Fumarate hydratase, mitochondrial OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=FUM1 PE=1 SV=2", 
">sp|A6ZNQ1|DBP5_YEAS7 ATP-dependent RNA helicase DBP5 OS=Saccharomyces cerevisiae (strain YJM789) GN=DBP5 PE=3 SV=1;>sp|P20449|DBP5_YEAST ATP-dependent RNA helicase DBP5 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=DBP5 PE=1 SV=2", 
">sp|Q08972|NEW1_YEAST [NU+] prion formation protein 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=NEW1 PE=1 SV=1", 
">sp|P34221|PP2C3_YEAST Protein phosphatase 2C homolog 3 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=PTC3 PE=1 SV=4", 
">sp|P28834|IDH1_YEAST Isocitrate dehydrogenase [NAD] subunit 1, mitochondrial OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=IDH1 PE=1 SV=2", 
">sp|P32775|GLGB_YEAST 1,4-alpha-glucan-branching enzyme OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=GLC3 PE=1 SV=2", 
">sp|P20459|IF2A_YEAST Eukaryotic translation initiation factor 2 subunit alpha OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=SUI2 PE=1 SV=1", 
">sp|P03962|PYRF_YEAST Orotidine 5-phosphate decarboxylase OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=URA3 PE=1 SV=2", 
">sp|Q00055|GPD1_YEAST Glycerol-3-phosphate dehydrogenase [NAD(+)] 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=GPD1 PE=1 SV=4", 
">sp|P13298|PYRE_YEAST Orotate phosphoribosyltransferase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=URA5 PE=1 SV=2", 
">sp|Q12305|RDL1_YEAST Thiosulfate sulfurtransferase RDL1, mitochondrial OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=RDL1 PE=1 SV=1", 
">sp|A6ZT71|SOL3_YEAS7 6-phosphogluconolactonase 3 OS=Saccharomyces cerevisiae (strain YJM789) GN=SOL3 PE=3 SV=1;>sp|B3LSS7|SOL3_YEAS1 6-phosphogluconolactonase 3 OS=Saccharomyces cerevisiae (strain RM11-1a) GN=SOL3 PE=3 SV=1;>sp|B5VK90|SOL3_YEAS6 6-phospho", 
">sp|P41057|RS29A_YEAST 40S ribosomal protein S29-A OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=RPS29A PE=1 SV=3", 
">sp|Q12449|AHA1_YEAST Hsp90 co-chaperone AHA1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=AHA1 PE=1 SV=1", 
">sp|P09232|PRTB_YEAST Cerevisin OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=PRB1 PE=1 SV=1", 
">sp|Q01662|AMPM1_YEAST Methionine aminopeptidase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=MAP1 PE=1 SV=2", 
">sp|Q02326|RL6A_YEAST 60S ribosomal protein L6-A OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=RPL6A PE=1 SV=2"
), A1 = c(24.4939908442259, 26.9901297110662, 26.9790532704269, 
26.2491230121829, 27.3237065976618, 21.5139112530241, 25.3458782142867, 
22.4334157259663, 28.2672074962839, 25.2416761448686, 25.641805067553, 
25.8574331830565, 25.8783633345764, 25.7731479621288, 27.0963885249819, 
25.081222602993, 25.4810191152259, 24.4878510184539, 26.2521153881943, 
22.6071586556094, 25.9715183712714), A2 = c(24.4521598558137, 
26.7777616451398, 26.8371083141266, 26.111654324567, 27.2240448206528, 
21.5575439087209, 24.8306618353191, 21.9755352070482, 28.1357370537282, 
25.1055508801603, 25.9025259029142, 25.9088750386658, 25.2035333149293, 
25.6561437473553, 26.9275656716652, 25.1003405099892, 24.2550341425505, 
24.3965663280385, 26.5686595693686, 22.7316347263509, 25.8215119457205
), A3 = c(24.4979558046294, 26.9462911960019, 26.8657892646, 
26.1355091951941, 27.2115681760896, 21.6118702362972, 24.4434592319952, 
22.0899590657904, 28.0340741938239, 25.4804397363636, 25.8497490151083, 
25.8401842382532, 25.2268511868877, 25.0589267482732, 26.8893323521966, 
25.1264181523755, 25.701716406897, 24.6099310597781, 26.5535918473391, 
22.6045675015735, 26.1545761611315), B1 = c(23.3243157291161, 
26.5952310528008, 26.1710779038877, 26.2626936848188, 26.3541554083491, 
22.986697526877, 24.4341241151255, 22.2559758435741, 28.1202029508932, 
25.4956090775129, 25.7933542489052, 26.0207642931343, 25.1198054151382, 
26.0971717069501, 26.5209924266217, 24.9194447938477, 22.9159359049572, 
24.6430383161188, 26.3140387008042, 22.3042811996551, 25.0083728001668
), B2 = c(24.0758612969067, 26.723083250181, 26.6566482800995, 
26.0373352889057, 26.9610137813375, 23.0065285908923, 24.4369742006201, 
21.2595176382374, 27.9565572499436, 25.5684532937553, 25.4945613798432, 
26.1612308871266, 25.1330707343108, 25.641172860751, 26.7592953935148, 
24.7482973055308, 24.3843010916183, 24.3197070353469, 26.2142288492881, 
22.4413868699827, 25.6435344756847), B3 = c(24.150535648865, 
26.7127877457518, 26.5922296160791, 25.8956131729837, 26.9543913546535, 
23.1584714264262, 24.3061224625014, 21.4107658939925, 28.0124353526521, 
25.5607873069432, 25.5701988807342, 26.1155712785834, 24.8222221231816, 
26.1858902029082, 26.7496166701725, 25.0198249168467, 24.3521999362085, 
24.2666847680311, 26.1605024565954, 22.5524750720661, 25.6306814977948
)), row.names = c("1000", "1001", "1002", "1003", "1004", "1005", 
"1006", "1007", "1008", "1009", "1010", "1011", "1012", "1013", 
"1014", "1015", "1016", "1017", "1018", "1019", "1020"), class = "data.frame")

This dataset contains more than 2500 rows and 2 groups (3 columns per group) with fasta_headers.

I am new to this field. So kindly give some code for finding the log2fold change and p-values from this kind of data.

R proteomics differential_expression • 543 views

ADD COMMENT • link updated 21 months ago by ATpoint 82k • written 21 months ago by KABILAN ▴ 40

score 5 · Answer 1 · 2022-07-28

May I kindly suggest not to ask questions like "please give me the code". Biostars is not an on-demand code writing service. The idea is rather to point out principles and concepts so people can dig into it and develop strategies to solve their problems. Most analysis and project anyway are too complex for a single or a few lines of code, and in the end the user must understand the principle behind it to confidently stand up for the results.

Generally, for high-throughput experiments like this one uses experts software for differential analysis. A common choice for proteomics or genererally any set of normalized values would be limma from Bioconductor. If you google a bit for "limma proteomics" you will find plenty of forum posts and tutorials. There are also multiple packages at Bioconductor wrapping limma for proteomics, such as DEP. Please go through these resources and develop your analysis based on it.