Entering edit mode
5.1 years ago
Darrill
•
0
Hi everyone one, I have a tab file such as :
seqnames start end strand
1 scaffold_0 1 50 -
2 scaffold_0 30 120 +
3 scaffold_0 60 400 -
4 scaffold_0 100 300 +
and the idea I had was to use Samtools faidx to get all the fasta sequences in my genome from the coordinates start
and end
by first creating a get_fasta.txt
file such as:
scaffold_0:1-50
scaffold_0:30-120
scaffold_0:60-400
scaffold_0:100-300
and then use the commande /samtools faidx my_genome.fa get_fasta.txt
But as you can see, the coordinates are wrong for the strand - and should be replaced by :
seqnames start end strand
1 scaffold_0 50 1 -
2 scaffold_0 30 120 +
3 scaffold_0 400 60 -
4 scaffold_0 100 300 +
So my question is:
Do you know if samtools can deal with strand and automatically pars the column strand in order to take the good coordinates?
Thank you for your help.
how about using https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html ?