User: smiller

gravatar for smiller
smiller70
Reputation:
70
Status:
Trusted
Location:
United States
Last seen:
1 year, 9 months ago
Joined:
4 years, 1 month ago
Email:
s***********@uchicago.edu

Posts by smiller

<prev • 8 results • page 1 of 1 • next >
0
votes
1
answer
795
views
1
answers
Answer: A: Substring dereplication of protein sequences
... Use CD-HIT for this task. ./cd-hit -i -o -c 1 -t 1 -d 0 ...
written 21 months ago by smiller70
0
votes
1
answer
795
views
1
answers
Comment: C: Substring dereplication of protein sequences
... User genomax's comment led me to the exact solution that I wanted. Instead of the CD-HIT tool, cd-hit-dup, use instead the tool, cd-hit. This clusters sequences, including subsequences. One can specify a sequence identity of 100%. The following command writes two files: a dereplicated fasta file and ...
written 21 months ago by smiller70
0
votes
1
answer
795
views
1
answers
Comment: C: Substring dereplication of protein sequences
... cd-hit-dup fails with the message cd-hit-dup: cdhit-dup.cxx:193: int HashingDepth(int, int): Assertion `len >= min' failed. This may be due to the fact that I have sequences as short as length 9. Fundamentally, this command is geared toward longer nucleotide sequences. It also does not do a ...
written 21 months ago by smiller70
1
vote
1
answer
795
views
1
answer
Substring dereplication of protein sequences
... I would like to dereplicate a 3 GB fasta file of amino acid sequences. I would like this to include the removal of shorter sequences found in longer sequences (substring dereplication). The purpose of this is the construction of a smaller database against which I search peptide mass spectra and the ...
proteomics dereplication written 21 months ago by smiller70
2
votes
1
answer
981
views
1
answers
Answer: C: Entrez esearch term format
... SIMPLE AND OBVIOUS SOLUTION: Enclose the search term in double quotes to make it literal. Entrez.read(Entrez.esearch(db='Taxonomy', term="\"Cupriavidus sp. HPC(L)\"")) correctly returns {'RetStart': '0', 'TranslationStack': [{'Count': '1', 'Field': 'All Names', 'Term': '"Cupriavidus sp. HPC( ...
written 2.2 years ago by smiller70
3
votes
1
answer
981
views
1
answer
Entrez esearch term format
... I'm using biopython to perform an Entrez esearch of the NCBI taxonomy database. The term is the exact string as it appears in the database. Entrez.read(Entrez.esearch(db='Taxonomy', term="Cupriavidus sp. HPC(L)")) returns {'QueryTranslation': 'Cupriavidus sp. HPC[All Names] AND (L[All Names] ...
ncbi taxonomy esearch entrez biopython written 2.2 years ago by smiller70
2
votes
1
answer
2.0k
views
1
answers
Answer: A: blastp short in BLAST+ vs. BLAST web interface
... Solution identified by comparing exported search strategies from the web and local versions.  Local blastp-short has composition-based score adjustment ON as default, despite what the user manual says.  One should turn this off by specifying "-comp_based_stats 0" in the BLAST command. ...
written 4.1 years ago by smiller70
3
votes
1
answer
2.0k
views
1
answer
blastp short in BLAST+ vs. BLAST web interface
... This question concerns differences between the output of blastp-short using BLAST+ vs. the web interface. When searching against RefSeq or nr (updated yesterday), BLAST+ misses some of the hits returned by the web interface, and sometimes assigns different bit scores to the same hits.  Below is an ...
blast written 4.1 years ago by smiller70

Latest awards to smiller

Popular Question 2.2 years ago, created a question with more than 1,000 views. For blastp short in BLAST+ vs. BLAST web interface
Scholar 2.2 years ago, created an answer that has been accepted. For C: Entrez esearch term format

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 914 users visited in the last hour