Selecting fastas for sliding windows
1
0
Entering edit mode
2.5 years ago
gubrins ▴ 290

Heys,

I am trying to create sliding windows out of a huge fasta for doing a phylogeny of those windows. I already broke apart my fasta files for each individual in windows and I have all that information in a unique file. In order to create different files (for each one of the windows) I am running this for loop:

for i in $(cat sliding_windows.txt);
do
grep -A1 $i better_sliding_windows_one_line.fa > $i.txt
done

where sliding_windows.txt is:

:1-1000000
:1000001-2000000
:2000001-3000000
:3000001-4000000

and where better_sliding_windows... is a file containing all my samples per windows:

sample_1_sliding:1-1000000
sample_1_sliding:1000001-2000000
sample_1_sliding:2000001-3000000
sample_1_sliding:3000001-4000000
sample_1_sliding:4000001-5000000

(All the examples are cut, I have way more samples and data)

When I do the for loop in order to create a file per sliding windows, the problem is that between the fasta file of one sample and the next one, it creates a new line with two --, which is giving me problems in later stages. If I look for them with grep, I can't find them, but they are there. How could I avoid them to appear or remove them?

Any help would be more than helpful, let me know if is everything clear. Thanks in advance!

bash • 593 views
ADD COMMENT
2
Entering edit mode
2.5 years ago

try --no-group-separator along with grep. If that doesn't work you can use sed or awk to remove --. However, I would suggest you to use https://bioinf.shenwei.me/seqkit/usage/#sliding for creating fasta files from each window, instead of using a shell loop. You can control the step and window sizes.

ADD COMMENT
0
Entering edit mode

that did the trick!! Thank you very much!!

ADD REPLY

Login before adding your answer.

Traffic: 2419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6