Question: get calls in R
0
gravatar for HG
14 months ago by
HG30
HG30 wrote:

Hi all, I have two dataframes: First data frame has three columns: ID, sire.ID, and Dam.ID

The second dataframe has the same three columns plus genotyping calls for 17 markers in 17 columns.

Here is what I need to be done in R: If sire.ID in the first dataframe exists in the ID column in the second dataframe, then return the sire.ID and its calls from the second dataframe in first dataframe for the relevant offspring, and same for Dam.ID. Simply, for each ID in the first dataframe I want to get the calls for its sire and dam in front of the ID and in two rows. Thanks in advance for your help

R • 452 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by HG30
1

I have changed your post to "question", a tutorial is when you show how to do stuff.

Your question seems rather R instead of bioinformatics. It is likely someone will close your post for this reason...

ADD REPLYlink written 14 months ago by Benn7.9k
1

Hi @HG, The community is not here to provide tailored programming solutions/ tutorials. Please amend your post to show some reproducible data to your question, along with code showing what you've tried.

ADD REPLYlink modified 14 months ago • written 14 months ago by andrew.j.skelton735.9k
1

Good description of data. But please post some example data and expected output. HG

ADD REPLYlink written 14 months ago by cpad011212k

Here is part of the first dataframe:

ID  Sire.ID Dam.ID
A       D              G
B        E          H
C        F              I

part of the Second dataframe:

ID  Sire.ID Dam.ID  marker1 marker2 marker3 marker4 marker5 marker6 marker7 marker8 marker9
D   j   K   12  21  11  11  12  22  11  22  11
G   l   m   12  21  11  11  12  22  11  22  11
H   n   o   12  21  11  11  12  22  11  22  11

I need this:

ID  Sire.ID Dam.ID                                              
A   D   G   D   j   K   12  21  11  11  12  22  11  22  11
A   D   G   G   l   m   12  21  11  11  12  22  11  22  11

Split each ID (each row) in the first dataframe into two rows and return the sire and dam calls from the second data frame in first and second splitted rows for each ID. If the information for the sire or dam doesn't exists in the second dataframe then write NA in the splitted rows.

Thanks

ADD REPLYlink modified 14 months ago by WouterDeCoster43k • written 14 months ago by HG30
1

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 14 months ago by WouterDeCoster43k
2
gravatar for cpad0112
14 months ago by
cpad011212k
India
cpad011212k wrote:

Your expected output seems to be incorrect. G and H from Dam.ID of data frame 1 matches with that from G and H from ID of Second frame. code:

> library(sqldf)
> a=read.csv("testa.txt", sep="\t", strip.white = T, stringsAsFactors = F)
> b=read.csv("testb.txt", sep="\t", strip.white = T, stringsAsFactors = F)
> sqldf ('SELECT * FROM a INNER JOIN b ON a."Sire.ID" = b.ID OR a."Dam.ID" = b.ID')

output:

ID Sire.ID Dam.ID ID..4 Sire.ID..5 Dam.ID..6 marker1 marker2 marker3 marker4 marker5 marker6 marker7
1  A       D      G     D          j         K      12      21      11      11      12      22      11
2  A       D      G     G          l         m      12      21      11      11      12      22      11
3  B       E      H     H          n         o      12      21      11      11      12      22      11
  marker8 marker9
1      22      11
2      22      11
3      22      11

input:

> a
  ID Sire.ID Dam.ID
1  A       D      G
2  B       E      H
3  C       F      I
> b
  ID Sire.ID Dam.ID marker1 marker2 marker3 marker4 marker5 marker6 marker7 marker8 marker9
1  D       j      K      12      21      11      11      12      22      11      22      11
2  G       l      m      12      21      11      11      12      22      11      22      11
3  H       n      o      12      21      11      11      12      22      11      22      11
ADD COMMENTlink modified 14 months ago • written 14 months ago by cpad011212k

Thank you very much for your reply and the correction for the data. Your code worked perfectly.

ADD REPLYlink modified 13 months ago • written 14 months ago by HG30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1843 users visited in the last hour