Question: Capturing clusters having T to C mutation
1
gravatar for Varun Gupta
6.5 years ago by
Varun Gupta1.2k
United States
Varun Gupta1.2k wrote:

Hi Everyone,

I have a bed file in the following format

chr7 	29434 	29451 	read 	13
chr7 	61642 	61661 	read 	16
chr7 	99192 	99229 	read 	11
chr7 	115309 	115326 	read 	5
chr7 	134211 	134229 	read 	8
chr7 	216704 	216727 	read 	29
chr7 	289611 	289636 	read 	6
chr7 	348415 	348440 	read 	5
chr7 	385948 	385968 	read 	7
chr7 	395913 	396002 	read 	533

 

These coordinates represent clusters and 5th column represents number of reads in each cluster.

When I view these coordinates one by one on IGV, I noticed that by default the allele frequency threshold is 0.2(which is exactly what i need)

I want to keep only those clusters in which at a position if ref is T (5'--->3') then, in the reads within cluster, it is C such that allele frequency threshold of 0.2 is satisfied.

Similarly for the reads which are mapping to the reverse strand, ref is A and read has G such that at that position allele frequency of 20 % is satisfied.

Is there a way to do that?

Is this possible to do with samtools??

Thanks for the help and time.

Regards
Varun

mutation bed • 1.7k views
ADD COMMENTlink modified 6.5 years ago by EagleEye6.7k • written 6.5 years ago by Varun Gupta1.2k
0
gravatar for Devon Ryan
6.5 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

You'll need to write a program to do that. Using a pileup in pysam or the samtools C API might be the easiest method.

ADD COMMENTlink written 6.5 years ago by Devon Ryan98k

Hi Devon,

Anything similar in PERL? Haven't done much coding in python or C. Looking into pysam though.

ADD REPLYlink written 6.5 years ago by Varun Gupta1.2k

Probably, I try to stay away from perl, though, so I wouldn't know off-hand.

ADD REPLYlink written 6.5 years ago by Devon Ryan98k
0
gravatar for EagleEye
6.5 years ago by
EagleEye6.7k
Sweden
EagleEye6.7k wrote:

Hello Varun,

I have made a PERL script which draws the result from output of VCF file (V4) generated by Freebayes. Please let me know whether you wanted the result like this. I will be soon depositing my scripts into GitHub. Please let me know if you need it and I will update you with the link whenever I am done with structuring my scripts.

 

This is the script.

https://github.com/santhilalsubhash/TransExtract_betaV1.2.git

But this one is not well structured (you might find it difficult to understand) as I said.

/santhilal

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by EagleEye6.7k

Hi Santilal,

This looks nice. From the bed file I posted as of now I am only interested in those clusters which have T--> C mutation at a particular locus >= 0.2. I used bowtie with 1 mismatch option so the reads can either have 0 mismatch or 1 mismatch at max.

SO FROM THE ABOVE BED FILE lets say only these many clusters had T-->C mutation >= 0.2. I just want to keep them. Having a nice graph is always good.

chr7     29434     29451     read     13
chr7     61642     61661     read     16
chr7     134211     134229     read     8
chr7     216704     216727     read     29
chr7     395913     396002     read     533

Can you share the perl script??

Thanks

Varun

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by Varun Gupta1.2k

This is the script.

https://github.com/santhilalsubhash/TransExtract_betaV1.2.git

But this one is not well structured (you might find it difficult to understand) as I said.

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by EagleEye6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1651 users visited in the last hour
_