I have file1 of 6.5lakh rows with two cols, "Chr" and "Pos". I want to compare this file with dbsnp (file2) datadump and match with with Chr and Pos col present in dbSNP dump. Once matched, respective rsid's to be fetched. I tried using Python Panda's but my process is getting killed. When it tried for 50000 rows it worked.
How can I fetch rsid for whole dataset (file1 = 6.5lakh rows) from dbSNP.
#Program to compare Chr and Pos of a sample with dBSNP and fetching RSIDs import pandas as pd df1 = pd.read_csv("v2_infi_chr_pos.csv",sep='\t',dtype='unicode') df2 = pd.read_csv("dbsnp150_header.txt",sep='\t',dtype='unicode') df3 = pd.merge(df1, df2, on='Chr''Pos', how='inner') export_csv = df3.to_csv (r'rsids_infiniumv2_hg38.txt', index = None, header=True)