creating for loops in R for nGS data
1
0
Entering edit mode
4.4 years ago
Ana ▴ 180

I have a question about doing a for loop in R, I would be very grateful if you could let me know your ideas. I'm working with NGS data, I have calculated r2 values to estimate linkage disequilibrium but I want to calculate LD decay for every single SNP in each contig.

This is the first 3 rows of my data:

scaffold94_798049_802097   999  NA  tscaffold94_798049_802097   999   NA  1
tscaffold94_798049_802097  999  NA  tscaffold94_798049_802097   1029  NA  1
tscaffold94_798049_50222   2011 NA tscaffold94_798049_802097    1029  NA  1

the first and third column are contig names. How can I make a loop to keep only those rows that the name of first and third columns are identical (means that only those two SNP located on the same contig)?

R loops • 1.1k views
2
Entering edit mode
4.4 years ago
TriS ★ 4.4k

R solution:

myDataFiltered <- myData[which(myData[,1] == myData[,4]),]

awk

awk -v FS='\t' -v OFS='\t' '{if($1 ==$4) print}' myFileWithData.txt > myFileWithFilteredData.txt
0
Entering edit mode

actually someone gave me the solution: Works perfectly fine

data$keep_dontKeep <- "dontKeep" for (i in 1:nrow(data)){ if(as.character(data$V1[i]) == as.character(data$V4[i])){ #If values in V1 and V3 are equal, categorize as 'keep' data$keep_dontKeep[i] <- "keep" } }

data <- data[data\$keep_dontKeep == "keep",]

0
Entering edit mode

TriS's R solution is way more simple and efficient (faster). It is also the recommended way. You do not need to use a for loop in R for subsetting.