Question: How To Retrieve Chromosome Lengths Using Bsgenome Package?
1
gravatar for James Ashmore
6.1 years ago by
London
James Ashmore90 wrote:

Hello, I am following along with the EBI's online course for ChIP-Seq analysis and have encountered an error when trying to retrieve chromosome lengths:

> library(BSgenome.Hsapiens.UCSC.hg19)
> seqlengths(NFKB) = seqlengths(Hsapiens)
Error in .normargSeqlengths(value, seqnames(x)) : 
  length of supplied 'seqlengths' must equal the number of sequences

I assume the reason for this is the additional chromosome entries in the Hsapiens object:

> seqlengths(NFKB)
 chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19  chr2 chr20 chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8 
   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA 
 chr9  chrM  chrX  chrY 
   NA    NA    NA    NA 
> seqlengths(Hsapiens)
                 chr1                  chr2                  chr3                  chr4                  chr5 
            249250621             243199373             198022430             191154276             180915260 
                 chr6                  chr7                  chr8                  chr9                 chr10 
            171115067             159138663             146364022             141213431             135534747 
                chr11                 chr12                 chr13                 chr14                 chr15 
            135006516             133851895             115169878             107349540             102531392 
                chr16                 chr17                 chr18                 chr19                 chr20 
             90354753              81195210              78077248              59128983              63025520 
                chr21                 chr22                  chrX                  chrY                  chrM 
             48129895              51304566             155270560              59373566                 16571 
 chr1_gl000191_random  chr1_gl000192_random        chr4_ctg9_hap1  chr4_gl000193_random  chr4_gl000194_random 
               106433                547496                590426                189789                191469 
        chr6_apd_hap1         chr6_cox_hap2         chr6_dbb_hap3        chr6_mann_hap4         chr6_mcf_hap5 
              4622290               4795371               4610396               4683263               4833398 
        chr6_qbl_hap6        chr6_ssto_hap7  chr7_gl000195_random  chr8_gl000196_random  chr8_gl000197_random 
              4611984               4928567                182896                 38914                 37175 
 chr9_gl000198_random  chr9_gl000199_random  chr9_gl000200_random  chr9_gl000201_random chr11_gl000202_random 
                90085                169874                187035                 36148                 40103 
      chr17_ctg5_hap1 chr17_gl000203_random chr17_gl000204_random chr17_gl000205_random chr17_gl000206_random 
              1680828                 37498                 81310                174588                 41001 
chr18_gl000207_random chr19_gl000208_random chr19_gl000209_random chr21_gl000210_random        chrUn_gl000211 
                 4262                 92689                159169                 27682                166566 
       chrUn_gl000212        chrUn_gl000213        chrUn_gl000214        chrUn_gl000215        chrUn_gl000216 
               186858                164239                137718                172545                172294 
       chrUn_gl000217        chrUn_gl000218        chrUn_gl000219        chrUn_gl000220        chrUn_gl000221 
               172149                161147                179198                161802                155397 
       chrUn_gl000222        chrUn_gl000223        chrUn_gl000224        chrUn_gl000225        chrUn_gl000226 
               186861                180455                179693                211173                 15008 
       chrUn_gl000227        chrUn_gl000228        chrUn_gl000229        chrUn_gl000230        chrUn_gl000231 
               128374                129120                 19913                 43691                 27386 
       chrUn_gl000232        chrUn_gl000233        chrUn_gl000234        chrUn_gl000235        chrUn_gl000236 
                40652                 45941                 40531                 34474                 41934 
       chrUn_gl000237        chrUn_gl000238        chrUn_gl000239        chrUn_gl000240        chrUn_gl000241 
                45867                 39939                 33824                 41933                 42152 
       chrUn_gl000242        chrUn_gl000243        chrUn_gl000244        chrUn_gl000245        chrUn_gl000246 
                43523                 43341                 39929                 36651                 38154 
       chrUn_gl000247        chrUn_gl000248        chrUn_gl000249 
                36422                 39786                 38502

Therefore I tried subsetting the Hsapiens object:

> Hsapiens[c("chr1","chr2")]
Error in Hsapiens[c("chr1", "chr2")] : 
  object of type 'S4' is not subsettable

Since it is not sub-settable, I tried retrieving the lengths via:

> lengths = seqlengths(Hsapiens)
> mylengths = lengths[1:25]
> mylengths
     chr1      chr2      chr3      chr4      chr5      chr6      chr7 
249250621 243199373 198022430 191154276 180915260 171115067 159138663 
     chr8      chr9     chr10     chr11     chr12     chr13     chr14 
146364022 141213431 135534747 135006516 133851895 115169878 107349540 
    chr15     chr16     chr17     chr18     chr19     chr20     chr21 
102531392  90354753  81195210  78077248  59128983  63025520  48129895 
    chr22      chrX      chrY      chrM 
 51304566 155270560  59373566     16571

Now I have two objects, the NFKB containing no chromosome lengths and the mylengths object containing the sequence lengths:

> seqlengths(NFKB)
 chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19  chr2 
   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA 
chr20 chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrM  chrX 
   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA 
 chrY 
   NA 
> mylengths
     chr1      chr2      chr3      chr4      chr5      chr6      chr7 
249250621 243199373 198022430 191154276 180915260 171115067 159138663 
     chr8      chr9     chr10     chr11     chr12     chr13     chr14 
146364022 141213431 135534747 135006516 133851895 115169878 107349540 
    chr15     chr16     chr17     chr18     chr19     chr20     chr21 
102531392  90354753  81195210  78077248  59128983  63025520  48129895 
    chr22      chrX      chrY      chrM 
 51304566 155270560  59373566     16571

I then tried setting the mylengths to the seqlengths(NFKB):

> seqlengths(NFKB) = mylengths
Error in .normargSeqlengths(value, seqnames(x)) : 
  when the supplied 'seqlengths' vector is named, the names must match the seqnames

To my eyes, the names do match, but maybe it's because they are in the wrong order? Am I being slow or is there a much easier way to retrieve the chromosome lengths?

Thanks for the help, James

length chromosome • 5.1k views
ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by James Ashmore90

Have you tried reordering mylengths so it's in the same order as seqlengths(NFKB)?

ADD REPLYlink written 6.1 years ago by Devon Ryan93k

Hello, yes I just tried that now and it has worked, thankfully:

> seqlengths(NFKB)
 chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19  chr2 
   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA 
chr20 chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrM  chrX 
   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA 
 chrY 
   NA 
> mylengths.test = mylengths[c("chr1","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr2","chr20","chr21","chr22","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chrM","chrX","chrY")]
> mylengths.test
     chr1     chr10     chr11     chr12     chr13     chr14     chr15 
249250621 135534747 135006516 133851895 115169878 107349540 102531392 
    chr16     chr17     chr18     chr19      chr2     chr20     chr21 
 90354753  81195210  78077248  59128983 243199373  63025520  48129895 
    chr22      chr3      chr4      chr5      chr6      chr7      chr8 
 51304566 198022430 191154276 180915260 171115067 159138663 146364022 
     chr9      chrM      chrX      chrY 
141213431     16571 155270560  59373566 
> seqlengths(NFKB) = mylengths.test
> seqlengths(NFKB)
     chr1     chr10     chr11     chr12     chr13     chr14     chr15 
249250621 135534747 135006516 133851895 115169878 107349540 102531392 
    chr16     chr17     chr18     chr19      chr2     chr20     chr21 
 90354753  81195210  78077248  59128983 243199373  63025520  48129895 
    chr22      chr3      chr4      chr5      chr6      chr7      chr8 
 51304566 198022430 191154276 180915260 171115067 159138663 146364022 
     chr9      chrM      chrX      chrY 
141213431     16571 155270560  59373566

I just wondered if there was a better way of doing this, given for every analysis I'll have to do something similar each time

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by James Ashmore90
my.lengths <- paste("chr",c(1:22,"M","X","Y"),sep="")
# and sort alphabetically if you want
my.sorted.lengths <- sort(my.lengths)
ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Irsan7.0k

Hi Irsan, if I have interpreted your comment correctly, that creates a vector of the names but not in the order they appear in the NFKB object? Instead I have done the following and it appears to work :

> library(BSgenome.Hsapiens.UCSC.hg19)
> chr.lengths = seqlengths(Hsapiens)[1:25]
> chr.lengths
     chr1      chr2      chr3      chr4      chr5      chr6      chr7 
249250621 243199373 198022430 191154276 180915260 171115067 159138663 
     chr8      chr9     chr10     chr11     chr12     chr13     chr14 
146364022 141213431 135534747 135006516 133851895 115169878 107349540 
    chr15     chr16     chr17     chr18     chr19     chr20     chr21 
102531392  90354753  81195210  78077248  59128983  63025520  48129895 
    chr22      chrX      chrY      chrM 
 51304566 155270560  59373566     16571 
> chr.names = names(seqlengths(NFKB))
> chr.names
 [1] "chr1"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17"
[10] "chr18" "chr19" "chr2"  "chr20" "chr21" "chr22" "chr3"  "chr4"  "chr5" 
[19] "chr6"  "chr7"  "chr8"  "chr9"  "chrM"  "chrX"  "chrY" 
> chr.lengths.match = chr.lengths[chr.names]
> chr.lengths.match
     chr1     chr10     chr11     chr12     chr13     chr14     chr15 
249250621 135534747 135006516 133851895 115169878 107349540 102531392 
    chr16     chr17     chr18     chr19      chr2     chr20     chr21 
 90354753  81195210  78077248  59128983 243199373  63025520  48129895 
    chr22      chr3      chr4      chr5      chr6      chr7      chr8 
 51304566 198022430 191154276 180915260 171115067 159138663 146364022 
     chr9      chrM      chrX      chrY 
141213431     16571 155270560  59373566 
> seqlengths(NFKB) = chr.lengths.match
> seqlengths(NFKB)
     chr1     chr10     chr11     chr12     chr13     chr14     chr15 
249250621 135534747 135006516 133851895 115169878 107349540 102531392 
    chr16     chr17     chr18     chr19      chr2     chr20     chr21 
 90354753  81195210  78077248  59128983 243199373  63025520  48129895 
    chr22      chr3      chr4      chr5      chr6      chr7      chr8 
 51304566 198022430 191154276 180915260 171115067 159138663 146364022 
     chr9      chrM      chrX      chrY 
141213431     16571 155270560  59373566
ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by James Ashmore90

Isn't the my.sorted.lengths the same as names(seqlengths(NFKB)) ?

ADD REPLYlink written 6.1 years ago by Irsan7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 935 users visited in the last hour