I need some advice. I have output file (count file) from VCF format. It looks like this:
Chr 10 protein_coding 447164 pseudogene 87457 Chr 11 protein_coding 368825 pseudogene 78131 Chr 12 protein_coding 357596 pseudogene 68176
and there are more chromosomes. I have two others files with another column names (but could differ with 1 or more fields between chromosomes). How can I convert that file to CSV or another file format. I mean, I want to create file like this:
Chr,protein_coding,pseudogene 10,447164,87457 11,368825,78131 12,357596,68176
Assuming that if some chromosome does not has for example pseudogene, than script will put empty field, e.g. for 15 chromosome:
Thank you in advance
Thank you for explaining the problem at hand so well.
What have you tried by yourself to solve this problem? How far did you get and what specific challenges are you facing?
I have no idea, how can I do that...
You can use
Chr. That will use
Chrto create records, so each record would contain all data between consecutive
Chrs. You can then replace each new line by a space and use
$2etc to get to your result.
Also, please see the following oddities:
Chr(the key) from the chromosome number (the value). Is this true?
These will make a difference in the final script you develop.
sorry, my bad. I have corrected everything.