1
3
Entering edit mode
8.5 years ago
Yu Fu ▴ 40

I am using VarScan to analyze somatic copy number alteration. In the recommended workflow found here:

http://varscan.sourceforge.net/copy-number-calling.html

, I am stuck at the fifth step, i.e. using mergeSegment.pl. It seems that this tool needs to be provided a reference file against the parameter --ref-arm-sizes, but I can find it nowhere. I was wondering if I can obtain this file and some more detailed instructions on using this script.

--ref-arm-sizes Two column file of reference name and size in bp for calling by chromosome arm

Though it does not say this is a necessary parameter, if I run the script without the parameter, it will display something like this:

YutekiMacBook-Pro:challenge_proj yfu$perl mergeSegments.pl 1767_chr22.copynumber Use of uninitialized value$input in <HANDLE> at mergeSegments.pl line 446.
readline() on unopened filehandle at mergeSegments.pl line 446.
Can't use an undefined value as a symbol reference at mergeSegments.pl line 456.


If I specify this parameter, the output is like this:

YutekiMacBook-Pro:challenge_proj yfu$perl mergeSegments.pl --ref-arm-sizes hs19.len 1767_chr22.copynumber Use of uninitialized value$arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line 1.
Use of uninitialized value $arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line 2. Use of uninitialized value$arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line 3.
Use of uninitialized value $arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line 4. Use of uninitialized value$arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line 5.
Use of uninitialized value $arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line 6. Use of uninitialized value$arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line 7.
Use of uninitialized value $arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line 8. Use of uninitialized value$arm_name in concatenation (.) or string at mergeSegments.pl line 453, <GEN0> line


It seems that I need a reference file that has a specific format.

varscan perl • 4.2k views
0
Entering edit mode

Although I haven't used this tool, to get best help please post your input command and the error you are getting. This will help give people some idea if it's a local perl module that's not installed on your system, or some other issue. Does the documentation (ie read me file) tell you about this --ref-arm-sizes parameter?

0
Entering edit mode

I have edited the post to make it more specific.

0
Entering edit mode

Ok, thanks for the output. What's in your hs19.len? It sounds like that is supposed to be your reference file?

1
Entering edit mode

Thank you for your patience. I just gave --ref-arm-sizes an arbitrary reference file (and the format must be wrong), since the authors did not mention what reference file they used (no matter in the paper or the supplemental materials or the source code).

The hs19.len looks like this:

1    chr1    249250621
2    chr2    243199373
3    chr3    198022430
4    chr4    191154276
5    chr5    180915260
6    chr6    171115067
7    chr7    159138663
8    chr8    146364022
9    chr9    141213431
10    chr10    135534747
11    chr11    135006516
12    chr12    133851895
13    chr13    115169878
14    chr14    107349540
15    chr15    102531392
16    chr16    90354753
17    chr17    81195210
18    chr18    78077248
19    chr19    59128983
20    chr20    63025520
21    chr21    48129895
22    chr22    51304566
23    chrX    155270560
24    chrY    59373566


It just contains the length of each chromosome. But judging from the name of the parameter - "" it seems that the reference file should contain the length of each chromosome arm. Unfortunately, I do not have that reference file and I can find it nowhere online. I emailed the author, she said that their team usually answer questions on biostar and that is why I ask the question here.

0
Entering edit mode

Your information about that hs19.len file states that it should be 2 columns. Get rid of the 1st column. The question is, what format their script expects the chromosomes in -- try a couple of variations of 1p or chr1p and 1q or chr1q, etc... and use the length of each chromosome arm. That may work.

0
Entering edit mode

Where can I get the length of each chromosome arm?

0
Entering edit mode

I do this via ucsc genome browser. Select the table browser. Use mammal/human/assembly hg19 (I'm assuming all this time we have been discussing H sapiens data, not some other organism with 46,XY karyotype). Then for group use "mapping and sequencing tracks" and for track choose "gap". Then choose region as "genome" and then set a filter => enter in "type" as "centromere telomere". For output format you can choose "hyperlinks to genome browser" which does what it says. Or you can ask for "selected fields from primary and related tables". Then hit "get output". All this gives you the location coordinate for each telomere and centromere for each chr. This gives you the length of each chr arm. (There are other ways of doing this...) Good luck!

0
Entering edit mode
8.4 years ago
dankoboldt ▴ 140

Thanks to everyone who chimed in; providing the ref-arm sizes file does resolve the mergeSegments.pl issue. Here's our file for hg19; note that our reference doesn't have chromosome names preceded by "chr" so that's reflected in this file:

1    0    125000000    p
1    125000000    249250621    q
2    0    93300000    p
2    93300000    243199373    q
3    0    91000000    p
3    91000000    198022430    q
4    0    50400000    p
4    50400000    191154276    q
5    0    48400000    p
5    48400000    180915260    q
6    0    61000000    p
6    61000000    171115067    q
7    0    59900000    p
7    59900000    159138663    q
8    0    45600000    p
8    45600000    146364022    q
9    0    49000000    p
9    49000000    141213431    q
10    0    40200000    p
10    40200000    135534747    q
11    0    53700000    p
11    53700000    135006516    q
12    0    35800000    p
12    35800000    133851895    q
13    0    17900000    p
13    17900000    115169878    q
14    0    17600000    p
14    17600000    107349540    q
15    0    19000000    p
15    19000000    102531392    q
16    0    36600000    p
16    36600000    90354753    q
17    0    24000000    p
17    24000000    81195210    q
18    0    17200000    p
18    17200000    78077248    q
19    0    26500000    p
19    26500000    59128983    q
20    0    27500000    p
20    27500000    63025520    q
21    0    13200000    p
21    13200000    48129895    q
22    0    14700000    p
22    14700000    51304566    q
X    0    60600000    p
X    60600000    155270560    q
Y    0    12500000    p
Y    12500000    59373566    q

0
Entering edit mode

How many columns in your file? The OP references needing a file made up of 2 columns:

About this parameter:

--ref-arm-sizes Two column file of reference name and size in bp for calling by chromosome arm

0
Entering edit mode

it actually accepts 4 column file, the format above is correct

0
Entering edit mode

Hey dankoboldt, thanks for providing correct ref-arm-sizes file, but when I tried to use mergeSegment.pl it still doesn't work, could you please help me to dig out where the question is? Here's the question link I posted: Question About Mergesegment.Pl In Varscan 2 Thank you!