I am trying to identify runs of homopolymers in sequences of a bed and have a perl script that works for the desired output below. There are ~55,000 lines in the custom.bed, is there a way to download all the sequences in the bed in the desired output format? UCSC table browser will give the output in the correct format however there is a limit of 1,000 entries per query. The below seems close but the output is formatted incorrect. Thank you :).
custom.bed chr1 948953 948956 chr1:948953-948956 . ISG15 BED=custom.bed for chr in `seq 1 22` X Y do wget -O - -q http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr1.fa.gz | gunzip -c >> hg19.fa done fastaFromBed -fi hg19.fa -bed $BED -fo $BED.fasta
desired output of each line for fasta sequence:
>hg19_refGene_NM_002335_0 range=chr11:68080173-68080283 5'pad=10 3'pad=10 strand=+ repeatMasking=none gccggacaacATGGAGGCAGCGCCGCCCGGGCCGCCGTGGCCGCTGCTGC TGCTGCTGCTGCTGCTGCTGGCGCTGTGCGGCTGCCCGGCCCCCGCCGCG Ggtaggtgggc