filter extract sequences
0
0
Entering edit mode
12 months ago
ntechen • 0

Hi, I am a bioinformatics beginner and have very little command line knowledge. I have downloaded raw sequence reads and assembled them with megahit into contigs ('genome of interest'). I have aligned to this genome several thousand DNA fragments of 100 bases long (probes) using blastn:

blastn -db Genome -query 100bpprobes.fa -out outtabular.txt -outfmt 6


The output is a tabular output file (12 columns) with some two thousand lines.

1   2   3   4   5   6   7   8   9   10  11  12
Query id    Subject id  % identity  alignment length    mismatches  gap openings    q. start    q. end  s. start    s. end  e-value bit score
3-Probe_168 k119_164365 92.31   78  6   0   43  120 1676    1599    4.53E-24    111
3- Probe _210   k119_164365 89.08   119 12  1   1   118 1676    1558    1.24E-34    147
3- Probe_252    k119_164365 90.54   74  6   1   4   76  1631    1558    1.27E-19    97.1
9-Probe _164    k119_164365 93.42   76  5   0   45  120 1678    1603    1.26E-24    113
9-Probe_205 k119_164365 91.45   117 10  0   4   120 1678    1562    4.43E-39    161
9-Probe_246 k119_164365 92.50   80  6   0   4   83  1637    1558    3.50E-25    115
242-Probe_645   k119_126278 95.65   46  2   0   75  120 4000    4045    5.94E-13    75
15-Probe_600    k119_34986  92.04   113 9   0   6   118 175 287 1.59E-38    159


I want to extract the sequences that are 500 bp upstream from the s.start til 500 bp downstream from s.end. How can I manage that? Thank you for your time.

Assembly genome sequence • 246 views
1
Entering edit mode

Create a BED format file using columns subject id, s.start, s.end (unix command cut for this). Then use bedtools slop (LINK) to add 500 bp at either end. Then you can use bedtools getfasta (LINK) to retrieve the sequences you need.

If you need more detailed instructions search for the two tools mentioned above using an external google search on biostars.

0
Entering edit mode

Does extraction include sequence in addition to upstream and downstream sequences? If you want only 500bp upstream and 500 bp downstream (without sequence between), use flank option from bedtools to create bed file and use getFasta from bedtools suite.