determine softmask regions from FASTA
1
0
Entering edit mode
9 weeks ago
Matteo Ungaro ▴ 110

Hi there,

I have the reference genome for a plant species which is soft-masked and need to determine those regions as a separate BED file coordinates for all chromosomes.

Now, I was looking up and few posts mentioned seqkit which I have tried but the results were not correct, apparently. For instance, I have the first 191 bases soft-masked, whereas the output for the tool is different...

Is there another way/a more consistent way to do so, that is extracting the total number of soft-mask regions per chromosome and store them in a BED? Let me know, thanks in advance!

BED FASTA softmask • 585 views
ADD COMMENT
2
Entering edit mode
9 weeks ago

run sed to convert lowercase (e.g: /^[^>]/s/[^ATGC]/N/g' ) to N and then Extracting N positions from fasta file

ADD COMMENT
0
Entering edit mode

Fully agree with this approach, only would I personally go for replacing them with X or such not to get confused by the Ns that might be already there to denote gaps in the assembly.

ADD REPLY
0
Entering edit mode

lieven.sterck how can I do so? Picard seems to accept only {N,ACGT,BOTH}...

ADD REPLY
0
Entering edit mode

you can do it using two sed /^[^>]/s/[ATGCN]/A/g' followed by /^[^>]/s/[^ATGCN]/N/g'

ADD REPLY
0
Entering edit mode

@Pierre Lindenbaum, I thought about two consecutive sed as well; however, I wonder whether there was a way to do it in a single instance. Possibly, that's the best way anyway, thanks again!

ADD REPLY
0
Entering edit mode

nevermind, just follow what Pierre Lindenbaum says (after all those years I should know not to contradict Pierre :) )

(I just realised that he's doing the opposite as what I thought of doing, it replaces all the uppercase letters,leaving the lowercases and then you can, following the linked post, get your lowercase regions, no risk at all to confuse with the gap-Ns thus)

ADD REPLY

Login before adding your answer.

Traffic: 2189 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6