Moderator: shenwei356

gravatar for shenwei356
shenwei3564.7k
Reputation:
4,710
Status:
Trusted
Location:
China
Website:
http://shenwei.me/
Twitter:
shenwei356
Scholar ID:
Google Scholar Page
Last seen:
8 hours ago
Joined:
7 years, 3 months ago
Email:
s*********@gmail.com

Posts by shenwei356

<prev • 543 results • page 1 of 55 • next >
3
votes
2
answers
140
views
2
answers
Answer: A: How to match fasta header list of name?
... ``` # IDs in seqs.fa $ grep '^>' seqs.fa | awk '{print $1}' | sed 's/^>//' M.Bce12308ORF4755P M.Bce1254ORF9725P # IDs not in list.txt $ grep -w -v -f <(grep '^>' seqs.fa | awk '{print $1}' | sed 's/^>//') list.txt M.Bce122ORF1082P ``` ...
written 19 days ago by shenwei3564.7k
6
votes
1
answer
184
views
1
answers
Answer: C: Kill nohup bash process
... Hi, I'd recommend using [screen](https://linuxize.com/post/how-to-use-linux-screen/) instead of `nohup`, screen is safer. And, you better utilize some batch commands like `parallel` to accelerate thousands of jobs, here is `wget`. Using `wget -c` to resume unfinished download is also recommended. ...
written 5 weeks ago by shenwei3564.7k
1
vote
1
answer
196
views
1
answers
Comment: C: Determining total nucleotides for paired end metagenomic sequences
... Use [seqkit](https://github.com/shenwei356/seqkit) for saving time. seqkit stats xxxx_R[12].*.fastq.gz Results are something like these: ``` $ seqkit stats reads_*.fq.gz file format type num_seqs sum_len min_len avg_len max_len reads_1.fq.gz FASTQ DNA 2,500 567,516 ...
written 3 months ago by shenwei3564.7k
0
votes
0
answers
224
views
0
answers
Comment: C: Rename fasta-header based on a list
... Just for given sample data. ``` $ sed 's/^>//' headers.txt | perl -pne 's/(\w+_\w+)_/$1\t/' > headers.tsv $ cat headers.tsv NZ_CP023010 Elizabethkingia anophelis FDAARGOS_198 NZ_MRWY01000004 Klebsiella michiganensis_CAV1755 $ seqkit replace -p '^(.+?)\..+_' -k headers.tsv -r '{kv}_' ...
written 3 months ago by shenwei3564.7k
1
vote
1
answer
333
views
1
answers
Comment: C: Increase memory limit in SPAdes-3.6.1
... ``` $ spades.py --help SPAdes genome assembler v3.13.0 ... Advanced options: --dataset file with dataset description in YAML format -t/--threads number of threads [default: 16] -m/--memory RAM limit for SPAdes in Gb (terminat ...
written 4 months ago by shenwei3564.7k
0
votes
2
answers
264
views
2
answers
Comment: C: How to best get ALL Bacterial proteins from NCBI
... Yes I know, I guess proteins of bacteria in RefSeq are enough for his/her purpose, before knowing for what he/she use the data. Anyway, one can try ``` # downlaod wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt # reformat cat assembly_summary.txt | sed 1d | sed '1s/^# ...
written 4 months ago by shenwei3564.7k
0
votes
2
answers
264
views
2
answers
Comment: C: How to best get ALL Bacterial proteins from NCBI
... You can also download `.faa.gz` files for every bacterium in [RefSeq](ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria), check [another tutorial](http://blog.shenwei.me/manipulation-on-ncbi-refseq-bacterial-assembly-summary/) ...
written 4 months ago by shenwei3564.7k
0
votes
2
answers
421
views
2
answers
Comment: C: Linearize fasta files
... ( ̄ へ ̄ )Yes, the [patch](https://github.com/lh3/seqtk/pull/123) is worth it. Thank you Fabian for this great patch! ...
written 5 months ago by shenwei3564.7k
2
votes
2
answers
421
views
2
answers
Answer: C: Linearize fasta files
... `seqtk seq input.fa` (1.3-r106 [68752fd](https://github.com/lh3/seqtk/commit/68752fd8497aff82b273b1f7f541b9905760586f)) is faster than `seqkit seq -w 0 input.fa` (v0.10.0) in my test on both SSD and HDD. Version: - seqkit [v0.10.0](https://github.com/shenwei356/seqkit/releases/tag/v0.10.0) - seqtk ...
written 5 months ago by shenwei3564.7k
1
vote
2
answers
394
views
2
answers
Comment: C: How to extract the last 1000 nt from a group of sequences in a FASTA file?
... [seqkit subseq](https://bioinf.shenwei.me/seqkit/usage/#subseq) supports this, if you want a fast solution. seqkit subseq -r -1000:-1 seqs.fa > result.fa If you want to learn programming, write some Python scripts using Biopython. ...
written 5 months ago by shenwei3564.7k

Latest awards to shenwei356

Teacher 19 days ago, created an answer with at least 3 up-votes. For A: How to get RNAfold (structure) output in text format
Scholar 19 days ago, created an answer that has been accepted. For A: looking for 16S RNA sequence consensus
Teacher 5 weeks ago, created an answer with at least 3 up-votes. For A: How to get RNAfold (structure) output in text format
Good Answer 5 weeks ago, created an answer that was upvoted at least 5 times. For A: Bioinformatics software distribution
Appreciated 5 weeks ago, created a post with more than 5 votes. For C: Inserting delim between numbers and strings in bash
Commentator 5 weeks ago, created a comment with at least 3 up-votes. For C: single line fasta to mult line fasta
Teacher 8 weeks ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Teacher 12 weeks ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Appreciated 5 months ago, created a post with more than 5 votes. For C: Inserting delim between numbers and strings in bash
Scholar 5 months ago, created an answer that has been accepted. For A: looking for 16S RNA sequence consensus
Teacher 7 months ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Teacher 7 months ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Teacher 8 months ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Good Answer 8 months ago, created an answer that was upvoted at least 5 times. For A: Bioinformatics software distribution
Appreciated 8 months ago, created a post with more than 5 votes. For C: Inserting delim between numbers and strings in bash
Scholar 9 months ago, created an answer that has been accepted. For A: looking for 16S RNA sequence consensus
Teacher 9 months ago, created an answer with at least 3 up-votes. For A: How to get RNAfold (structure) output in text format
Teacher 9 months ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Teacher 10 months ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Teacher 11 months ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Commentator 12 months ago, created a comment with at least 3 up-votes. For C: single line fasta to mult line fasta
Appreciated 12 months ago, created a post with more than 5 votes. For C: Inserting delim between numbers and strings in bash
Teacher 12 months ago, created an answer with at least 3 up-votes. For A: single line fasta to mult line fasta
Scholar 12 months ago, created an answer that has been accepted. For A: looking for 16S RNA sequence consensus
Appreciated 12 months ago, created a post with more than 5 votes. For C: Inserting delim between numbers and strings in bash

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 664 users visited in the last hour