Entering edit mode
3.9 years ago
ntechen
•
0
Hi, I am a bioinformatics beginner and have very little command line knowledge. I have downloaded raw sequence reads and assembled them with megahit into contigs ('genome of interest'). I have aligned to this genome several thousand DNA fragments of 100 bases long (probes) using blastn:
blastn -db Genome -query 100bpprobes.fa -out outtabular.txt -outfmt 6
The output is a tabular output file (12 columns) with some two thousand lines.
1 2 3 4 5 6 7 8 9 10 11 12
Query id Subject id % identity alignment length mismatches gap openings q. start q. end s. start s. end e-value bit score
3-Probe_168 k119_164365 92.31 78 6 0 43 120 1676 1599 4.53E-24 111
3- Probe _210 k119_164365 89.08 119 12 1 1 118 1676 1558 1.24E-34 147
3- Probe_252 k119_164365 90.54 74 6 1 4 76 1631 1558 1.27E-19 97.1
9-Probe _164 k119_164365 93.42 76 5 0 45 120 1678 1603 1.26E-24 113
9-Probe_205 k119_164365 91.45 117 10 0 4 120 1678 1562 4.43E-39 161
9-Probe_246 k119_164365 92.50 80 6 0 4 83 1637 1558 3.50E-25 115
242-Probe_645 k119_126278 95.65 46 2 0 75 120 4000 4045 5.94E-13 75
15-Probe_600 k119_34986 92.04 113 9 0 6 118 175 287 1.59E-38 159
I want to extract the sequences that are 500 bp upstream from the s.start til 500 bp downstream from s.end. How can I manage that? Thank you for your time.
Create a BED format file using columns
subject id, s.start, s.end
(unix commandcut
for this). Then usebedtools slop
(LINK) to add 500 bp at either end. Then you can usebedtools getfasta
(LINK) to retrieve the sequences you need.If you need more detailed instructions search for the two tools mentioned above using an external google search on biostars.
GenoMax, how do you extract a n size sequence from the 5' end side? slop allows you to decrease/increase the sequence size but what I want is to extract say the first nucleotide sequences from the start or end
Can you clarify what you mean by end to end? You could simply add
n
bases to the start column to get sequence on just 5'-end.Does extraction include sequence in addition to upstream and downstream sequences? If you want only 500bp upstream and 500 bp downstream (without sequence between), use
flank
option frombedtools
to create bed file and usegetFasta
frombedtools
suite.