Manipulating a fasta file to only have specific characters
2
0
Entering edit mode
6 months ago
A ★ 3.9k

Hi

I have a fasta file started by

>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN


I want a fasta which is a one line character string; just keep the nucleotides characters like

Basically I should remove anything that is not T, C, G, A or N. After replacing any such characters with "N"

I have tried this but gives an empty file

cat input_fasta.fa | sed -r 's/[RYKMSWBVHD]/N/g' > output_fasta.fa


Can you help me?

Thank you so much

fasta sed • 368 views
1
Entering edit mode

input:

$cat test.fa >chr RYKMSWBVHD aTGC ATGkK ATGC wWVhDd  output: $ seqkit replace -sip '[^ATGCN]' -r "N" -w 0 test.fa | seqkit seq -us

NNNNNNNNNNATGCATGNNATGCNNNNNN

0
Entering edit mode

Linearize your fasta file using @Pierre's code (which you can easily find by searching for "linearize fasta", should be first hit). Then remove the first column to leave just the sequence.

3
Entering edit mode
6 months ago
A ★ 3.9k
wget http://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/chromosomes/chr2.fa.gz

zgrep -v ">chr" chr2.fa.gz | tr -d '\n' | sed -e '\$a\'  > chr2.fa

0
Entering edit mode
6 months ago
Qiongyi ▴ 110

Do you want to remove the header line in your input file? If so, your output file is not in fasta format. If your sequences are already in one line, the below command can do the trick.

grep -v ">" input_fasta.fa | sed 's/[RYKMSWBVHD]/N/g'  > output_fasta.fa

0
Entering edit mode

No unfortunately the sequence as I have shown is not in one line and I want to convert that to a one line sequence only contains A, T, C, G and N

0
Entering edit mode

Do you know how to run a PERL script? If so, you can use my script to convert your fasta file to a one line format. Then use the above grep & sed command to do other stuffs. You may download the script @ https://github.com/Qiongyi/custom_PERL_scripts/

perl linker.pl input_fasta.fa input_oneline.fa
grep -v ">" input_oneline.fa | sed 's/[RYKMSWBVHD]/N/g'  > output_fasta.fa

0
Entering edit mode

Thank you

0
Entering edit mode

GO TO https://github.com/Qiongyi/custom_PERL_scripts/ AND THEN FIND linker.pl