Capturing clusters having T to C mutation
2
1
Entering edit mode
9.7 years ago
Varun Gupta ★ 1.3k

Hi Everyone,

I have a bed file in the following format

chr7    29434   29451   read    13
chr7    61642   61661   read    16
chr7    99192   99229   read    11
chr7    115309  115326  read    5
chr7    134211  134229  read    8
chr7    216704  216727  read    29
chr7    289611  289636  read    6
chr7    348415  348440  read    5
chr7    385948  385968  read    7
chr7    395913  396002  read    533

These coordinates represent clusters and 5th column represents number of reads in each cluster.

When I view these coordinates one by one on IGV, I noticed that by default the allele frequency threshold is 0.2(which is exactly what i need)

I want to keep only those clusters in which at a position if ref is T (5'--->3') then, in the reads within cluster, it is C such that allele frequency threshold of 0.2 is satisfied.

Similarly for the reads which are mapping to the reverse strand, ref is A and read has G such that at that position allele frequency of 20 % is satisfied.

Is there a way to do that?

Is this possible to do with samtools??

Thanks for the help and time.

Regards
Varun

mutation bed • 2.4k views
ADD COMMENT
0
Entering edit mode
9.7 years ago

You'll need to write a program to do that. Using a pileup in pysam or the samtools C API might be the easiest method.

ADD COMMENT
0
Entering edit mode

Hi Devon,

Anything similar in PERL? Haven't done much coding in python or C. Looking into pysam though.

ADD REPLY
0
Entering edit mode

Probably, I try to stay away from perl, though, so I wouldn't know off-hand.

ADD REPLY
0
Entering edit mode
9.7 years ago
EagleEye 7.5k

Hello Varun,

I have made a PERL script which draws the result from output of VCF file (V4) generated by Freebayes. Please let me know whether you wanted the result like this. I will be soon depositing my scripts into GitHub. Please let me know if you need it and I will update you with the link whenever I am done with structuring my scripts.

This is the script.

But this one is not well structured (you might find it difficult to understand) as I said.

/santhilal

ADD COMMENT
0
Entering edit mode

Hi Santilal,

This looks nice. From the bed file I posted as of now I am only interested in those clusters which have T--> C mutation at a particular locus >= 0.2. I used bowtie with 1 mismatch option so the reads can either have 0 mismatch or 1 mismatch at max.

SO FROM THE ABOVE BED FILE lets say only these many clusters had T-->C mutation >= 0.2. I just want to keep them. Having a nice graph is always good.

chr7     29434     29451     read     13
chr7     61642     61661     read     16
chr7     134211     134229     read     8
chr7     216704     216727     read     29
chr7     395913     396002     read     533

Can you share the perl script??

Thanks
Varun

ADD REPLY
0
Entering edit mode

This is the script.

https://github.com/santhilalsubhash/TransExtract_betaV1.2

But this one is not well structured (you might find it difficult to understand) as I said.

ADD REPLY

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6