Question

Capturing clusters having T to C mutation

1

Entering edit mode

9.7 years ago

Varun Gupta ★ 1.3k

Hi Everyone,

I have a bed file in the following format

chr7    29434   29451   read    13
chr7    61642   61661   read    16
chr7    99192   99229   read    11
chr7    115309  115326  read    5
chr7    134211  134229  read    8
chr7    216704  216727  read    29
chr7    289611  289636  read    6
chr7    348415  348440  read    5
chr7    385948  385968  read    7
chr7    395913  396002  read    533

These coordinates represent clusters and 5th column represents number of reads in each cluster.

When I view these coordinates one by one on IGV, I noticed that by default the allele frequency threshold is 0.2(which is exactly what i need)

I want to keep only those clusters in which at a position if ref is T (5'--->3') then, in the reads within cluster, it is C such that allele frequency threshold of 0.2 is satisfied.

Similarly for the reads which are mapping to the reverse strand, ref is A and read has G such that at that position allele frequency of 20 % is satisfied.

Is there a way to do that?

Is this possible to do with samtools??

Thanks for the help and time.

Regards
Varun

mutation bed • 2.4k views

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by Varun Gupta ★ 1.3k

score 0 · Answer 1 · 2014-08-06

0

Entering edit mode

9.7 years ago

Devon Ryan 104k

You'll need to write a program to do that. Using a pileup in pysam or the samtools C API might be the easiest method.

ADD COMMENT • link 9.7 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon,

Anything similar in PERL? Haven't done much coding in python or C. Looking into pysam though.

ADD REPLY • link 9.7 years ago by Varun Gupta ★ 1.3k

0

Entering edit mode

Probably, I try to stay away from perl, though, so I wouldn't know off-hand.

ADD REPLY • link 9.7 years ago by Devon Ryan 104k

Ram · Answer 2 · 2014-08-07

0

Entering edit mode

9.7 years ago

EagleEye 7.5k

Hello Varun,

I have made a PERL script which draws the result from output of VCF file (V4) generated by Freebayes. Please let me know whether you wanted the result like this. I will be soon depositing my scripts into GitHub. Please let me know if you need it and I will update you with the link whenever I am done with structuring my scripts.

This is the script.

But this one is not well structured (you might find it difficult to understand) as I said.

/santhilal

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by EagleEye 7.5k

0

Entering edit mode

Hi Santilal,

This looks nice. From the bed file I posted as of now I am only interested in those clusters which have T--> C mutation at a particular locus >= 0.2. I used bowtie with 1 mismatch option so the reads can either have 0 mismatch or 1 mismatch at max.

SO FROM THE ABOVE BED FILE lets say only these many clusters had T-->C mutation >= 0.2. I just want to keep them. Having a nice graph is always good.

chr7     29434     29451     read     13
chr7     61642     61661     read     16
chr7     134211     134229     read     8
chr7     216704     216727     read     29
chr7     395913     396002     read     533

Can you share the perl script??

Thanks
Varun

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by Varun Gupta ★ 1.3k

0

Entering edit mode

This is the script.

https://github.com/santhilalsubhash/TransExtract_betaV1.2

But this one is not well structured (you might find it difficult to understand) as I said.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by EagleEye 7.5k