Question

How to get list of 150 bp sequences for each of 15,000 from chromosome positions

2

Entering edit mode

9.5 years ago

sebastiz ▴ 20

I have a long list of chromosome positions in the format below, except there are 15,000 of them. I would like to get the sequence for each of the spans (or alternatively 150bp up from the start position). Can anyone tell me how to do this for this number of positions?

chr15:60212080-60213230
chr10:60242850-60243000
chr11:60469240-60469390
chr19:60954240-60954390
chr12:61260820-61260970
chr21:61576770-61576920
chr1:61586420-61586570
chr1:61927840-61927990

Thanks in advance

sequencing genome • 2.8k views

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by sebastiz ▴ 20

Ram · Answer 1 · 2014-10-25

3

Entering edit mode

9.5 years ago

smilefreak ▴ 420

K hopefully this should be easy enough.

Follow these steps.

samtools faidx <reference genome>

and a little bash loop.

while read region
do
samtools faidx <reference genome> $region >> output_file.txt
done < FILE_REGIONS

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by smilefreak ▴ 420

Ram · Answer 2 · 2014-10-26

3

Entering edit mode

9.5 years ago

PoGibas 5.1k

Using bedtools getfasta. Input is: chromosome positions & reference genome ( https://www.biostars.org/p/1796/ ).

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by PoGibas 5.1k

Ram · Answer 3 · 2014-10-25

2

Entering edit mode

9.5 years ago

Sean Davis 26k

Using Bioconductor/R:

create GRanges object containing your ranges
use the appropriate BSgenome package (or FaFile) and getSeq()

Using DAS after replacing "-" with ",". This URL gives an example result:

http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:100000,200000

Many other possiblities!

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by Sean Davis 26k

Ram · Answer 4 · 2014-10-26

1

Entering edit mode

9.5 years ago

Matt Shirley 10k

Using Python, you can use the pyfaidx module:

pip install --user pyfaidx
faidx genome.fasta --bed regions.bed

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by Matt Shirley 10k