Question: suggestions with bedtools interpretation with CNV bed files
gravatar for ivivek_ngs
4.2 years ago by
Seattle,WA, USA
ivivek_ngs4.8k wrote:

Dear All,

I need some suggestions regarding CNV analysis. I want to know CNVs which I have found in normal/tumor pair , to what extent they are overlapping with CNVs from normal/ipsc( induced pluripotent stem cell derived from tumor). To this I extracted the CNVs regions from both normal/tumor( hereafter reported as tumor.bed) and normal/ipsc (ipsc.bed)  in form of bedfiles. Now if I have to report the overlap of regions or features that between tumor.bed and ipsc.bed which option of intersect.bed should I use?  I have 121 rows for tumor.bed and 199 for ipsc.bed. My tumor.bed file and ipsc.bed file only contains chr#, start and end coordinates.
 I want to know the CNVs that I found in tumor.bed to what extent they are conserved in ipsc.bed. I am providing the below bedtools command. Which should I use to get this information?

bedtools intersect -a tumor.bed -b ipsc.bed -wa -wb -f 1.0 | wc -l

45 ( this shows features of tumor.bed file that overlaps 100% with ipsc.bed) . This is similiar to

bedtools intersect -a tumor.bed -b ipsc.bed -u -f 1.0 | wc -l


however if I do just

bedtools intersect -a tumor.bed -b ipsc.bed -wa -wb | wc -l 



bedtools intersect -a tumor.bed -b ipsc.bed -u | wc -l


So I am bit confused which should be the ideal command to use in my case. My goal is to see if CNVs found in tumor remain conserved in ipsc or not? I would like to have some suggestions




sequencing cnv exome bedtools • 1.5k views
ADD COMMENTlink modified 4.2 years ago by Devon Ryan89k • written 4.2 years ago by ivivek_ngs4.8k
gravatar for Devon Ryan
4.2 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

With -f 1.0, you're asking for cases where a region in tumor.bed is completely contained within ipsc.bed. Without -f 1.0, you're asking for regions that overlap by at least one base. That is the reason for the difference. It's likely that called regions won't be identical between the two files even if a CNV is completely conserved. Consequently, try an intermediate value for -f (maybe 0.5). With so few sites, you can just manually look through the data to derive a reasonable -f value.

ADD COMMENTlink written 4.2 years ago by Devon Ryan89k

Yes I have actually tried with other intermediate values like -f .50 or .75 and considerably found a higher hit. I would obviously not consider that the CNV region which I found in tumor will be completely conserved in ipsc as identical regions since the ipsc is a single clone. Definitely one thing which I understood from your reply is that I should use the restriction filter in such overlaps. just simply intersect bed files will not serve my purpose also I dont want one base overlap, my look out is for regions of CNVs that are conserved between tumor and its ipsc. So I should use with different values of -f and look though the output.

ADD REPLYlink written 4.2 years ago by ivivek_ngs4.8k

Exactly. What percentage is optimal will probably depend on how big the CNVs and how you called them/what sort of technology you used. In any case, if you get ~ a third complete overlap then regardless of the criterion there's a LOT of overlap in the CNVs. Given that your IPSCs came from the tumor, that makes sense.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k


I have usually used the exome data and called the CNVs with Control-FREEC using default window of minimum 500 and step 250 with ploidy status 2. The CNVs are quite large infact. The median number of bases in a CNV region for my tumor data is 475249 and for my IPSC is 1033749. The CNVs are much larger in the iPSC which should be the likely scenario. So  -f .50 parameter should hold good to see the regions that are CNV in tumor to what extent are they present in IPSC having minimum of 50% overlap between regions.  But primary idea is that on reprogramming the tumor to its IPSC the genome background is not completely compromised and that CNVs  are moving from tumor to its reprogrammed clone. Obviously am not negating the fact that iPSC will also acquire CNVs but to what extent tumor CNVs are present in IPSC is my actual concern.I have already played with -f .50,.75 and 1.0 . Here my question is about the genomic background maintenance. Do you wish to add any more suggestions  Devon Ryan ?

ADD REPLYlink written 4.2 years ago by ivivek_ngs4.8k

I'd have to put some thought into that and get back to you if anything that you've likely not thought of comes to mind.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

Thanks a lot, I appreciate that.

ADD REPLYlink written 4.2 years ago by ivivek_ngs4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1526 users visited in the last hour