Compare two VCF files in Python
1
0
Entering edit mode
2.6 years ago
vpsev3 ▴ 20

Hello,

I would like to compare two VCF files in python

Here is my files : mygenome.csv and clinvar.csv enter image description here

This is how I structured it, the two files are organized the same way, except that there is no "INFO" column in mygenome.vcf

I would like to find the identical rows in the POS column, then have the results in CSV with the POS, REF, ALT and INFO column

I searched for scripts but I have memory errors because the files are too large and I might lose data if I split them into several parts

python • 1.9k views
ADD COMMENT
0
Entering edit mode

You can try the VCF file reader in Python as described here . Once you parse the files, you can compare them as you want.

ADD REPLY
0
Entering edit mode

It doesn't work

Does anyone have a solution?

ADD REPLY
0
Entering edit mode

It doesn't work

This is not sufficient description of failure. Show us sample data, expected results, actual results and the difference.

ADD REPLY
1
Entering edit mode
2.6 years ago

If the data is really structured as you say so, this code should work:

perl -lane 'BEGIN {
open IN, "clinvar.csv" or die $!;
while (<IN>) {
 chomp; @cols = split "\t";
 $clinvar{ join "\t", @cols[0..2] } = $cols[3];
} close IN }
$index = join "\t", @F[0..2];
print "$index\t$clinvar{$index}" if exists $clinvar{$index}:
' mygenome.vcf

It stores clinvar information indexed by the first 3 columns, and it only prints the mygenome.vcf lines if clinvar annotation exists for that first 3 columns... except that then mygenome.vcf won't be really a vcf file.

What I would really do? Put your input.vcf file (proper vcf format) next to a clinvar.vcf file (proper vcf format) and use bcftools annotate.

ADD COMMENT

Login before adding your answer.

Traffic: 1623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6