Filtering pileup from site lists
0
0
Entering edit mode
3.5 years ago

I want to write a script that filters a pileup file from a site lists file.
As an input I get a reference genome, pileup and site lists files.

Example of an output for this script:
Pileup File :

NC_000001.10    11456   A   0   *   *  
NC_000001.10    11457   C   0   *   *  
NC_000001.10    11458   T   0   *   *  
NC_000002.11    11460   G   1   ,   E  
NC_000002.11    13914   G   1   ,   E   
NC_000003.11    10503990    C   1   ,   E

Reference File (After using grep by '>'):

>NC_000001.10 Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly
>NT_113878.1 Homo sapiens chromosome 1 unlocalized genomic contig, GRCh37.p13 Primary Assembly
>NT_167207.1 Homo sapiens chromosome 1 unlocalized genomic contig, GRCh37.p13 Primary Assembly
>NC_000002.11 Homo sapiens chromosome 2, GRCh37.p13 Primary Assembly
>NC_000003.11 Homo sapiens chromosome 3, GRCh37.p13 Primary Assembly

Lists File:

chr1    11456   C1orf186    -   intronic    intronic    no  no  N   N   N
chr1    11457   C1orf186    -   intronic    intronic    no  no  N   N   N
chr2    13914   intergenic  -   intergenic  intergenic  no  no  N   N   N
chr7    30504355    NOD1    -   intronic    intronic    no  no  N   N   N
chr3    10503990    SSX2IP  -   Syn Gln->Gln    no  no  N   N   N
chr1    13131320    MEF2A   +   intronic    intronic    no  no  N   N   N

Output File:

NC_000001.10    11456   A   0   *   *
NC_000001.10    11457   C   0   *   *
NC_000002.11    13914   G   1   ,   E
NC_000003.11    10503990    C   1   ,   E

So for every refseq in the pileup file I would convert it to the matching chromose<id> in which is found in the reference file, and then overlap it with my lists file so I would get the lines with the same chromose<id> and position.

The problem is that while there are many refseq coordinates that belong to chr1 for example, alignment to each of them will give a position, with respect to the specific coordinate. But on the list there is a full, single, chr1.

I need to be able to compare between:
* a pileup file that was created based on a refseq reference genome
* a file with editing sites that was downloaded from RADAR (we may give the file)
Maybe there is a more appropriate reference genome file or that there is a way to convert the refseq positions according to the list

alignment refseq pileup genome • 738 views
ADD COMMENT
0
Entering edit mode

convert your pileup to bed using awk and then use bedtools intersect

ADD REPLY
0
Entering edit mode

Could you please elaborate on how exactly I would need to use bedtools intersect?

ADD REPLY

Login before adding your answer.

Traffic: 1623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6