Hello, I am trying to convert a bed file into a vcf by using python. I started off by parsing the file. Then created another file to open the reference genome to compare it to the bed file. The next step that I am trying to do is to write was is in rows to a file called P_1_BZ.vcf. However, I am getting an error that rows needs to be a string to write rows to the vcf file. I tried using join() to convert it to strings, but the format changes and does not write the appropriate information to the new file. Can someone help me understand how to appropriately write what rows
contains to the vcf file?
#! usr/bin/env python
import Coronavirus
#Deletions that occurs in B_1_351_SA
deletion1=Coronavirus.referenceGenome(11287,11296)
deletion2=Coronavirus.referenceGenome(22280,22289)
with open("B_1_351_SA.bed") as fo:
lines = fo.readlines()
#this is the header for what the vcf file needs
header = ["CHROM ","POS"," ID"," REF"," ALT"]
print(header)
rows = []
# print(header)
for i in range(0,len(lines)):
# print(lines[i])
# for the first two rows since the start and end SNP contain 3 digits
# focus on the first 2 rows
l = [0,1]
if(i in l):
# print(lines[i])
rows= [
lines[i][0:11].split(), #CHROM
lines[i][11:15].split(), #POS : the position that it started
lines[i][19:25].split(), #ID : position that shows where was the alteration
lines[i][20:21].split(), #REF : reference nucleotide
lines[i][24:25].split() #ALT : nucleotide it changed to
]
l = [2,3,4]
if(i in l):
rows = [
lines[i][0:11].split(), #CHROM
lines[i][11:16].split(), #POS : the position that it started
lines[i][21:28].split(), #ID : position that shows where was the alteration
lines[i][22:23].split (), #REF : reference nucleotide
lines[i][27:28].split() #ALT : nucleotide it changed to
]
l = [5,7,8,10,11,12,13,14,15,16,17,18]
if(i in l):
rows = [
lines[i][0:11].split(), #CHROM
lines[i][11:18].split(), #POS : the position that it started
lines[i][24:32].split(), #ID : position that shows where was the alteration
lines[i][24:25].split(), #REF : reference nucleotide
lines[i][30:31].split() #ALT : aleration nucleotide
]
#row 6 and 9 have deletions and needed to properly print it out
l = [6]
if i in l:
rows =[
lines[i][0:11].split(), #CHROM
lines[i][11:18].split(), #POS : the position that it started
lines[i][24:34].split(), deletion1
]
#row 6 and 9 have deletions and needed to properly print it out
l = [9]
if i in l:
rows =[
lines[i][0:11].split(), #CHROM
lines[i][11:18].split(), #POS : the position that it started
lines[i][24:34].split(), deletion2
]
with open("P_1_BZ.vcf","w+") as vcf:
vcf.write(rows)
This is a glimpse of what rows
contains and what I am trying to write to P_1_BZ.vcf
['NC_045512v2'], ['11287'], ['del_11288'], ['TCTGGTTTT']]
[['NC_045512v2'], ['12777'], ['C12778T'], ['C'], ['T']]
[['NC_045512v2'], ['13859'], ['C13860T'], ['C'], ['T']]
[['NC_045512v2'], ['14407'], ['C14408T'], ['C'], ['T']]
[['NC_045512v2'], ['17258'], ['G17259T'], ['G'], ['T']]
[['NC_045512v2'], ['21613'], ['C21614T'], ['C'], ['T']]
[['NC_045512v2'], ['21620'], ['C21621A'], ['C'], ['A']]
[['NC_045512v2'], ['21637'], ['C21638T'], ['C'], ['T']]
[['NC_045512v2'], ['21973'], ['G21974T'], ['G'], ['T']]
[['NC_045512v2'], ['22131'], ['G22132T'], ['G'], ['T']]