Question: UK Biobank l2r data holds segment means but no segment locations?
gravatar for tohc
18 months ago by
United States, Irvine, University of California - Irvine
tohc0 wrote:

I've been looking at UK Biobank data and it seems the data holds the segment mean l2r (or log base 2) values for the Copy Number Variation but doesn't actually have the segment start and end positions. Each file is for a particular chromosome and contains all 500,000 patients but I was wondering if anyone knows where we might find the actual location on the chromosome the values correspond to.

cnv biobank genome • 493 views
ADD COMMENTlink modified 16 months ago by Eric T.2.6k • written 18 months ago by tohc0

biobank, cnv seem to be tags relevant to your post, whereas genome doesn't really say much. Please add relevant tags so people following those tags would get to see your posts.

ADD REPLYlink written 18 months ago by RamRS30k

Added the tags you mentioned

ADD REPLYlink written 17 months ago by tohc0

Well, in addition to Ram's comments, at which data are you looking, exactly? - you have provided no links. I can probably just contact the relevant person directly if you let me know from where you obtained your data.

ADD REPLYlink written 18 months ago by Kevin Blighe67k

Hey Kevin,

I can't exactly give you a link to the data itself. UK Biobank has a strict policy on how data is given out however this is the project website UK Biobank. A lot of the documentation seems to be talking about raw sequencing reads, however the inferred l2r CNV data is technically using these files to create the output files from my understanding.

This is the link to the actual instructions for data download Resource 664. The data we are using is the CNV log2r data however as you can read, the files downloaded are per chromosome. The issue is the files essentially only hold the log2r values but give no indication of which portion of the chromosome they are from, which is not very helpful.

Hope that clarifies things.

Thanks in advance!

Edit: I should also mention that segment means are the log2r values, I've been using them interchangeably.

ADD REPLYlink modified 17 months ago • written 17 months ago by tohc0
gravatar for Eric T.
16 months ago by
Eric T.2.6k
San Francisco, CA
Eric T.2.6k wrote:

My understanding is that the UK Biobank l2r files are the copy ratio estimates at each probe in the SNP array -- they have not been segmented in the publicly available dataset, so there are no segment breakpoints.

There are a couple of papers that survey CNVs; they used PennCNV on these input files to smooth the CNV signal and detect breakpoints. I'd retrieve the processed calls from those studies, rather than UKB; reprocessing would be incredibly expensive, and the original studies were done well.

I'm aware of efforts to call CNVs from the recently available whole-exome sequencing datasets as well. These aren't available for the full 500k cohort yet, but it's worth keeping an eye on these efforts, as the SNP arrays may not have used probes at some potentially important / likely CNV locations.

ADD COMMENTlink written 16 months ago by Eric T.2.6k

I see. That would make more sense I suppose. Our literature search also found PennCNV usage in a lot of papers. I'm assuming that each probe in the array should be "roughly" next to each other on a physical chromosome, however that seems like a rather big assumption.

For my own conceptual understanding l2r values in UK Biobank are essentially estimated CNVs for the SNPs in the array?

Either way thank you for your explanation!

ADD REPLYlink written 12 months ago by tohc0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1353 users visited in the last hour