Hi,
I need some advice. I have output file (count file) from VCF format. It looks like this:
Chr 10
protein_coding 447164
pseudogene 87457
Chr 11
protein_coding 368825
pseudogene 78131
Chr 12
protein_coding 357596
pseudogene 68176
and there are more chromosomes. I have two others files with another column names (but could differ with 1 or more fields between chromosomes). How can I convert that file to CSV or another file format. I mean, I want to create file like this:
Chr,protein_coding,pseudogene
10,447164,87457
11,368825,78131
12,357596,68176
Assuming that if some chromosome does not has for example pseudogene, than script will put empty field, e.g. for 15 chromosome:
15,132598,
Thank you in advance
Thank you for explaining the problem at hand so well.
What have you tried by yourself to solve this problem? How far did you get and what specific challenges are you facing?
I have no idea, how can I do that...
You can use
awkwithRS=Chr. That will useChrto create records, so each record would contain all data between consecutiveChrs. You can then replace each new line by a space and use$1,$2etc to get to your result.Also, please see the following oddities:
protein_codingbecomesprotein_codinpseudogene. Why?Chr=16Chr(the key) from the chromosome number (the value). Is this true?These will make a difference in the final script you develop.
sorry, my bad. I have corrected everything.