Question: Compare two cols of one file to another file of same cols and fetch the matches
0
gravatar for sofie_carolina
18 months ago by
Hyderabad
sofie_carolina20 wrote:

I have file1 of 6.5lakh rows with two cols, "Chr" and "Pos". I want to compare this file with dbsnp (file2) datadump and match with with Chr and Pos col present in dbSNP dump. Once matched, respective rsid's to be fetched. I tried using Python Panda's but my process is getting killed. When it tried for 50000 rows it worked.

How can I fetch rsid for whole dataset (file1 = 6.5lakh rows) from dbSNP.

#Program to compare Chr and Pos of a sample with dBSNP and fetching RSIDs
import pandas as pd
df1 = pd.read_csv("v2_infi_chr_pos.csv",sep='\t',dtype='unicode')
df2 = pd.read_csv("dbsnp150_header.txt",sep='\t',dtype='unicode')
df3 = pd.merge(df1, df2, on='Chr''Pos', how='inner')
export_csv = df3.to_csv (r'rsids_infiniumv2_hg38.txt', index = None, header=True)
python snp next-gen gene genome • 307 views
ADD COMMENTlink written 18 months ago by sofie_carolina20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2016 users visited in the last hour