Entering edit mode
6.0 years ago
Tastulek
•
0
Hello Biostar Community,
I would like to create new files by extracting values from two separate source files.
Source Files:
File A.txt
Samples C-001 C-002 C-003
Hg R1a C2 C2
SNPs
CTS9677 + +
CTS103 + + +
P15
F2992 + +
CTS3607 +
where
- 1st Row: Sample ID
- 2nd Row: Y chromosome haplogroup
- 3rd Row: SNPs (as head for the 1st column)
- 4th Row and beyond: SNP ID/rsID and genotypes
File B.txt
CTS9677 Y 19020366 [A/G]
CTS7498 Y 17510288 [T/C]
CTS103 Y 2730292 [A/T]
P15 Y 23244026 [A/G]
F2992 Y 19236136 [A/G]
CTS3607 Y 15063588 [A/G]
where
- 1st column: SNP ID/rsID
- 2nd column: Chromosome
- 3rd column: Position of SNPs
- 4th column: Genotype [Ancestral/Derived]
Intended Output File:
C-001.txt
CTS9677 Y 19020366 A
CTS103 Y 2730292 T
P15 Y 23244026 A
F2992 Y 19236136 A
CTS3607 Y 15063588 G
Here, the values of the 4th column are determined by the value of 2nd column [i.e. C-001] of File A.txt [+ or blank] and the 4th column of File B.txt [Ancestral/Derived], e.g. [A/G] where A is ancestral, G is derived. In File A.txt, "blank" indicates ancestral genotype, "+" indicates derived genotype.
I would like to generate separate data files for the samples C-001, C-002, and C-003, by using an R or Python code.
Could anyone help with the issue?