Question: Finding non ATCGN nucleotides in a fasta file?
0
gravatar for dpearton
4 months ago by
dpearton0
South Africa
dpearton0 wrote:

Hello,

I have downloaded a genome assembly from genbank (refseq) and it apparently contains some nucleotides that are not either ACTGN (according to the error file from the radinitio program).

I would like to try and find out what these are prior to fixing the file. I've tried various combinations of grep...

grep -i -v [ACTGN]+ sequence.fas

etc., but they either find everything in the file, or nothing.

I would like to do a "simple" grep that finds lines that contains any characters IN ADDITION to [ACTGN] (either case). I can get rid of fasta headers by piping grep -v '>'

Thanks!

iupac sequence grep fasta genome • 112 views
ADD COMMENTlink written 4 months ago by dpearton0
1
gravatar for shenwei356
4 months ago by
shenwei3565.2k
China
shenwei3565.2k wrote:
grep -i -E '[^ACTGN]+'
ADD COMMENTlink written 4 months ago by shenwei3565.2k

Thank you very much for this. I had tried the caret negation but I did not use the -E so it didn't work.

ADD REPLYlink written 4 months ago by dpearton0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1705 users visited in the last hour