Question

suggestions with bedtools interpretation with CNV bed files

0

Entering edit mode

9.2 years ago

ivivek_ngs ★ 5.2k

Dear All,

I need some suggestions regarding CNV analysis. I want to know CNVs which I have found in normal/tumor pair , to what extent they are overlapping with CNVs from normal/ipsc (induced pluripotent stem cell derived from tumor). To this I extracted the CNVs regions from both normal/tumor (hereafter reported as tumor.bed) and normal/ipsc (ipsc.bed) in form of bedfiles. Now if I have to report the overlap of regions or features that between tumor.bed and ipsc.bed which option of intersect.bed should I use? I have 121 rows for tumor.bed and 199 for ipsc.bed. My tumor.bed file and ipsc.bed file only contains chr#, start and end coordinates.

I want to know the CNVs that I found in tumor.bed to what extent they are conserved in ipsc.bed. I am providing the below bedtools command. Which should I use to get this information?

bedtools intersect -a tumor.bed -b ipsc.bed -wa -wb -f 1.0 | wc -l
45

(this shows features of tumor.bed file that overlaps 100% with ipsc.bed). This is similar to

bedtools intersect -a tumor.bed -b ipsc.bed -u -f 1.0 | wc -l
45

However if I do just

bedtools intersect -a tumor.bed -b ipsc.bed -wa -wb | wc -l
122

or

bedtools intersect -a tumor.bed -b ipsc.bed -u | wc -l
92

So I am bit confused which should be the ideal command to use in my case. My goal is to see if CNVs found in tumor remain conserved in ipsc or not? I would like to have some suggestions

Regards

sequencing cnv exome bedtools • 2.9k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by ivivek_ngs ★ 5.2k

Ram · Answer 1 · 2015-02-02

1

Entering edit mode

9.2 years ago

Devon Ryan 104k

With -f 1.0, you're asking for cases where a region in tumor.bed is completely contained within ipsc.bed. Without -f 1.0, you're asking for regions that overlap by at least one base. That is the reason for the difference. It's likely that called regions won't be identical between the two files even if a CNV is completely conserved. Consequently, try an intermediate value for -f (maybe 0.5). With so few sites, you can just manually look through the data to derive a reasonable -f value.

ADD COMMENT • link 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

Yes I have actually tried with other intermediate values like -f .50 or .75 and considerably found a higher hit. I would obviously not consider that the CNV region which I found in tumor will be completely conserved in ipsc as identical regions since the ipsc is a single clone. Definitely one thing which I understood from your reply is that I should use the restriction filter in such overlaps. just simply intersect bed files will not serve my purpose also I dont want one base overlap, my look out is for regions of CNVs that are conserved between tumor and its ipsc. So I should use with different values of -f and look though the output.

ADD REPLY • link 9.2 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

Exactly. What percentage is optimal will probably depend on how big the CNVs and how you called them/what sort of technology you used. In any case, if you get ~ a third complete overlap then regardless of the criterion there's a LOT of overlap in the CNVs. Given that your IPSCs came from the tumor, that makes sense.

ADD REPLY • link 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

I have usually used the exome data and called the CNVs with Control-FREEC using default window of minimum 500 and step 250 with ploidy status 2. The CNVs are quite large infact. The median number of bases in a CNV region for my tumor data is 475249 and for my IPSC is 1033749. The CNVs are much larger in the iPSC which should be the likely scenario. So -f .50 parameter should hold good to see the regions that are CNV in tumor to what extent are they present in IPSC having minimum of 50% overlap between regions. But primary idea is that on reprogramming the tumor to its IPSC the genome background is not completely compromised and that CNVs are moving from tumor to its reprogrammed clone. Obviously am not negating the fact that iPSC will also acquire CNVs but to what extent tumor CNVs are present in IPSC is my actual concern.I have already played with -f .50,.75 and 1.0. Here my question is about the genomic background maintenance. Do you wish to add any more suggestions Devon Ryan?

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

I'd have to put some thought into that and get back to you if anything that you've likely not thought of comes to mind.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks a lot, I appreciate that.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by ivivek_ngs ★ 5.2k