Deleted:How can I find matches between two csv files according to multiple columns using awk/bash?
0
0
Entering edit mode
2.7 years ago
Gargantu8 • 0

Hello! I am working on identifying genomic regions that code for notable genes that have been knocked out completely in my samples. File 1 is a list of locations in the genome where notable genes occur. File 2 is a list of locations in my samples' genomes that have complete knockouts (2 means diploid/expected, 0 means no copies of that region). Columns 6-23 correspond to my 18 samples.

I want to compare File 1 and 2 in two ways:

  • First, match "subject id" in File 1 with "seqnames" in File 2
  • Then, see if File 1's s.start-s.end range overlaps or falls within File 2's start and end range.

If such matches occur, I would like to print out columns 2-23 from File 2, and then every column from File 1 to the right of that.

The closest solution I have found so far is an awk solution on StackExchange. In this post, "Y" would be the equivalent of "seqnames/subject id" and Ymin/Ymax would correspond to "start/end" from File 2. Y1 would be similar to "s.start/s.end" from File 1. I just haven't had success in expanding on that for my solution.

I am thinking an awk/bash solution similar to the StackExchange post would make the most sense, but any and all suggestions would be greatly appreciated! Cheers.

An example output would be something like this: Output Example *I made up this example. "seqnames" from File 2 matches "subject id" from File 1. "s.start" and/or "s.end" from File 1 overlaps/falls within "start" and "end" from File 2. Because of this match,

File 1:

AnthocyaninGeneLocationsExample

File 2: cn.MOPSLocationsExample

awk genomics bash • 485 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6