Question: get calls in R
0
gravatar for HG
21 months ago by
HG30
HG30 wrote:

Hi all, I have two dataframes: First data frame has three columns: ID, sire.ID, and Dam.ID

The second dataframe has the same three columns plus genotyping calls for 17 markers in 17 columns.

Here is what I need to be done in R: If sire.ID in the first dataframe exists in the ID column in the second dataframe, then return the sire.ID and its calls from the second dataframe in first dataframe for the relevant offspring, and same for Dam.ID. Simply, for each ID in the first dataframe I want to get the calls for its sire and dam in front of the ID and in two rows. Thanks in advance for your help

R • 548 views
ADD COMMENTlink modified 21 months ago • written 21 months ago by HG30
1

I have changed your post to "question", a tutorial is when you show how to do stuff.

Your question seems rather R instead of bioinformatics. It is likely someone will close your post for this reason...

ADD REPLYlink written 21 months ago by Benn8.0k
1

Hi @HG, The community is not here to provide tailored programming solutions/ tutorials. Please amend your post to show some reproducible data to your question, along with code showing what you've tried.

ADD REPLYlink modified 21 months ago • written 21 months ago by andrew.j.skelton736.0k
1

Good description of data. But please post some example data and expected output. HG

ADD REPLYlink written 21 months ago by cpad011214k

Here is part of the first dataframe:

ID  Sire.ID Dam.ID
A       D              G
B        E          H
C        F              I

part of the Second dataframe:

ID  Sire.ID Dam.ID  marker1 marker2 marker3 marker4 marker5 marker6 marker7 marker8 marker9
D   j   K   12  21  11  11  12  22  11  22  11
G   l   m   12  21  11  11  12  22  11  22  11
H   n   o   12  21  11  11  12  22  11  22  11

I need this:

ID  Sire.ID Dam.ID                                              
A   D   G   D   j   K   12  21  11  11  12  22  11  22  11
A   D   G   G   l   m   12  21  11  11  12  22  11  22  11

Split each ID (each row) in the first dataframe into two rows and return the sire and dam calls from the second data frame in first and second splitted rows for each ID. If the information for the sire or dam doesn't exists in the second dataframe then write NA in the splitted rows.

Thanks

ADD REPLYlink modified 21 months ago by WouterDeCoster44k • written 21 months ago by HG30
1

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 21 months ago by WouterDeCoster44k
2
gravatar for cpad0112
21 months ago by
cpad011214k
India
cpad011214k wrote:

Your expected output seems to be incorrect. G and H from Dam.ID of data frame 1 matches with that from G and H from ID of Second frame. code:

> library(sqldf)
> a=read.csv("testa.txt", sep="\t", strip.white = T, stringsAsFactors = F)
> b=read.csv("testb.txt", sep="\t", strip.white = T, stringsAsFactors = F)
> sqldf ('SELECT * FROM a INNER JOIN b ON a."Sire.ID" = b.ID OR a."Dam.ID" = b.ID')

output:

ID Sire.ID Dam.ID ID..4 Sire.ID..5 Dam.ID..6 marker1 marker2 marker3 marker4 marker5 marker6 marker7
1  A       D      G     D          j         K      12      21      11      11      12      22      11
2  A       D      G     G          l         m      12      21      11      11      12      22      11
3  B       E      H     H          n         o      12      21      11      11      12      22      11
  marker8 marker9
1      22      11
2      22      11
3      22      11

input:

> a
  ID Sire.ID Dam.ID
1  A       D      G
2  B       E      H
3  C       F      I
> b
  ID Sire.ID Dam.ID marker1 marker2 marker3 marker4 marker5 marker6 marker7 marker8 marker9
1  D       j      K      12      21      11      11      12      22      11      22      11
2  G       l      m      12      21      11      11      12      22      11      22      11
3  H       n      o      12      21      11      11      12      22      11      22      11
ADD COMMENTlink modified 21 months ago • written 21 months ago by cpad011214k

Thank you very much for your reply and the correction for the data. Your code worked perfectly.

ADD REPLYlink modified 21 months ago • written 21 months ago by HG30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1249 users visited in the last hour