Entering edit mode
6.4 years ago
pushu1bawa
•
0
I want to compare each row of a file to find elements that are common.
Input file:
V1 V2 V3 V4 V5
sample_1 AA TT AT TC CC
sample_2 TT AG CT GG
sample_3 AA AT TT
sample_4 GG CC AA TT AT
Expected output
sample_1 sample_2 sample_3 sample_4
sample_1 4 1 3 4
sample_2 1 4 1 2
sample_3 2 1 3 3
sample_4 4 1 3 5
Please make the post clearer. You can use the code button for editing.
Please edit your post and add what you've tried so far. As such, this is purely an R question and could be closed for that reason.
Hint:
reshape2::melt()should be really useful here. That ortidyr::gather(). You'll need to usemelt()andcolsplit()/gather()andseparate()to get from wide-form data -> long-form data -> analysis -> wide-form results.Yes It moslty coding issue. This is what I have tried so far I have binned by bam file (10kb) and have found barcode (10bp seq) from my bam file in each bin. So my input file is a row names as coordinates and columns containing barcode sequence. I want to compare each bin (row) to another to find number of barcodes common between the two rows. The desired output is a matrix with row name and column name as the row name of input and each element of matrix represent the number of overlapping barcodes.
I'm sorry, I cannot invest the time it takes to investigate your custom code and why it doesn't work on your dataset. Like I said, going to long form, aggregating to get your results and transforming those results to wide form will be the reproducible way to go.
The first thing I see when I look at the function
findMatchis the undeclared dependency on the objectdata. The function only takes argumentsiandnbut operates oni,nanddata. This means that it depends on the environment to have a specific type of dataset nameddata, which breaks reproducibility. Plus, yourlapplycall passes in a constant value forn, so that parameter is useless in what seems to be a function built specially for this use case.