Question: Making disjoint chromosomal sites contiguous with awk
0
gravatar for selplat21
4 weeks ago by
selplat2120
selplat2120 wrote:

I'm trying to write a loop in awk using the following info from two files:

  • a file with chromosome in the first column and site in the second column
  • a second file with chromosome in the first column and chromosome size in the second column

The sites in the second column range from the first to the last site of that chromosome, but the next chromosome will have sites starting from 1 again. I need to make all the sites in the first file contiguous so I will need to add the chromosome size to each site for chromosomes greater than 1 to make the sites contiguous in the first file.

Any help is appreciated!

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by selplat2120

Pleas provide representative input and desired output.

ADD REPLYlink written 4 weeks ago by ATpoint41k

For example, a section of file 1 looks like this (chromosome, site):

Chr2 884860
Chr2 884875
Chr2 884892

The second file looks like this (chromosome, chromosome size):

Chr1    196345723
Chr2    149451176
Chr3    114294425

For every chromosome bigger than 1 in the first file, I need to add the chromosome size of the preceding chromosome to make it continuous. So, the section of file 1 should look like this:

Chr2 197230583
Chr2 197230598
Chr2 197230615
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by selplat2120

Your question seems unclear about the exact operation being performed, and your output looks suspect (duplicate rows in the output, but the first file contains different "sites").

Can you please simplify the question and double-check what the input and output should look like?

ADD REPLYlink written 4 weeks ago by Alex Reynolds31k

I apologize, one of the sites was accidentally duplicated there. I edited it.

File 1, Column 1 = Chromosome

File 1, Column 2 = Site

Example:

Chr1    1
Chr1    3
Chr1    5
...
Chr2    3
Chr2    6
Chr2    7
...
Chr3    4
Chr3    6
Chr3    8
...

File 2, Column 1 = Chromosome

File 2, Column 2 = Chromosome Size

Example:

Chr1    196345723
Chr2    149451176
Chr3    114294425
...

Desired Output:

File 1 has a list of sites for each chromosome ranging from 1 to the total chromosome size of that chromosome. Note that some sites are not present because these are filtered sites. However, the maximum site value is the chromosome size and minimum value is 1 for each chromosome. The desired output file makes this first file contiguous between chromosomes so that chromosome 2 would start where chromosome 1 ends. In order to do this, I would add the total chromosome size of Chr1 to all site values of Chr2 in the first file and so on for each subsequent chromosome.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by selplat2120

I am simply just trying to add the value of file 2, column 2 to each site of file 1 column 2, but the value being added to file 1 is from the preceding chromosome. That being said, Chr1 would be ignored. Any additional chromosomes would have to add the chromosome size of all preceding chromosomes to the sites value.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by selplat2120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1746 users visited in the last hour