Question

creating for loops in R for nGS data

0

Entering edit mode

8.4 years ago

Ana ▴ 200

I have a question about doing a for loop in R, I would be very grateful if you could let me know your ideas. I'm working with NGS data, I have calculated r2 values to estimate linkage disequilibrium but I want to calculate LD decay for every single SNP in each contig.

This is the first 3 rows of my data:

scaffold94_798049_802097   999  NA  tscaffold94_798049_802097   999   NA  1
tscaffold94_798049_802097  999  NA  tscaffold94_798049_802097   1029  NA  1
tscaffold94_798049_50222   2011 NA tscaffold94_798049_802097    1029  NA  1

the first and third column are contig names. How can I make a loop to keep only those rows that the name of first and third columns are identical (means that only those two SNP located on the same contig)?

R loops • 2.1k views

ADD COMMENT • link updated 8.4 years ago by TriS ★ 4.8k • written 8.4 years ago by Ana ▴ 200

score 2 · Answer 1 · 2017-01-31

2

Entering edit mode

8.4 years ago

TriS ★ 4.8k

R solution:

myData <- theResultsYouHaveAlready
myDataFiltered <- myData[which(myData[,1] == myData[,4]),]

awk

awk -v FS='\t' -v OFS='\t' '{if($1 == $4) print}' myFileWithData.txt > myFileWithFilteredData.txt

ADD COMMENT • link 8.4 years ago by TriS ★ 4.8k

0

Entering edit mode

actually someone gave me the solution: Works perfectly fine

data$keep_dontKeep <- "dontKeep"

for (i in 1:nrow(data)){ if(as.character(data$V1[i]) == as.character(data$V4[i])){ #If values in V1 and V3 are equal, categorize as 'keep' data$keep_dontKeep[i] <- "keep" } }

data <- data[data$keep_dontKeep == "keep",]

ADD REPLY • link 8.4 years ago by Ana ▴ 200

0

Entering edit mode

TriS's R solution is way more simple and efficient (faster). It is also the recommended way. You do not need to use a for loop in R for subsetting.

ADD REPLY • link 8.4 years ago by ddiez ★ 2.0k