Moderator: Matt Shirley

gravatar for Matt Shirley
Matt Shirley7.2k
Reputation:
7,200
Status:
Trusted
Location:
Cambridge, MA
Website:
http://mattshirley.com/
Twitter:
mdshw5
Scholar ID:
Google Scholar Page
Last seen:
7 hours ago
Joined:
6 years ago
Email:
m*****@gmail.com

Posts by Matt Shirley

<prev • 642 results • page 1 of 65 • next >
0
votes
10
answers
28k
views
10
answers
Comment: C: How To Extract A Sequence From A Big (6Gb) Multifasta File ?
... pip install pyfaidx then faidx --regex "^((?!>1;).)*$" input.fa > output.fa or faidx --invert-match --regex "^>1;.*$" input.fa > output.fa The first example uses negative lookaheads, which may be more difficult to reason about, while the second example depends on the `- ...
written 8 days ago by Matt Shirley7.2k
2
votes
2
answers
171
views
2
answers
Comment: C: Modifying Fasta file header
... Apparently biopython uses the strict definition (if FASTA has any) of the ID as everything before the first space. See https://www.biostars.org/p/18987/ To get the whole header you want `SeqRecord.description` not `SeqRecord.id` ...
written 5 weeks ago by Matt Shirley7.2k
1
vote
2
answers
171
views
2
answers
Comment: C: Modifying Fasta file header
... Most methods that access FASTA entries using the offsets stored in a *.fai file will truncate the header name at the first whitespace. However, Bio.SeqIO does not use this scheme. Both samtools and pyfaidx do, but there's a method in pyfaidx: `FastaRecord.longname` will recover the entire header nam ...
written 6 weeks ago by Matt Shirley7.2k
0
votes
2
answers
171
views
2
answers
Comment: C: Modifying Fasta file header
... It might be helpful to know why you want to modify your headers in this fashion and what some of your other headers look like. ...
written 6 weeks ago by Matt Shirley7.2k
1
vote
2
answers
138
views
2
answers
Comment: C: Adding Fasta unique identifiers
... awk '/^>/ {printf(">%d %s\n",++N,substr($0,2));next;} {print;}' input.fa > output.fa ...
written 6 weeks ago by Matt Shirley7.2k
0
votes
3
answers
151
views
3
answers
Answer: A: Delete fasta sequence with a pattern "unassigned peptidases"
... $ pip install pyfaidx $ faidx sequences.fa --regex '.*unassigned peptidases.*' --invert-match > no_peptidases.fa You can find more usage for `faidx` here: https://github.com/mdshw5/pyfaidx#faidx ...
written 6 weeks ago by Matt Shirley7.2k
2
votes
3
answers
237
views
3
answers
Answer: A: Parsing FASTA file using class in Python
... If you want a fasta file to act like a sequence dictionary, just use [pyfaidx](https://github.com/mdshw5/pyfaidx): import pyfaidx fa = pyfaidx.Fasta("sample.fa") for key in fa: print(key) # sequence name print(fa[key]) # sequence object You'll be using an efficient method t ...
written 7 weeks ago by Matt Shirley7.2k
5
votes
1
answer
339
views
1
answers
Comment: C: Can Biostars use question template like Github issue/PR?
... I really like this idea, though care needs to be taken not to punish users if they can't clearly describe the problem, and make sure "I haven't tried anything" is sometimes appropriate when we're learning new subject matter. ...
written 8 weeks ago by Matt Shirley7.2k
2
votes
8
answers
490
views
8
answers
Answer: A: fasta seq header
... $ pip install pyfaidx $ faidx -e "lambda x: x.split('|')[0]" genes.fa >gene_1 ATGCGTCGACGTCGTACGGGTTTT CGTACGGGTTATGCGTCGACGTC GTACGGGTTTT ... ...
written 8 weeks ago by Matt Shirley7.2k
0
votes
0
answers
240
views
0
answers
Comment: C: GC Content of Fasta file --- Python Help
... No, apparently I'm blind :) ...
written 8 weeks ago by Matt Shirley7.2k

Latest awards to Matt Shirley

Scholar 2 days ago, created an answer that has been accepted. For A: How to use pygr? worldbase doesn't return anything
Popular Question 7 days ago, created a question with more than 1,000 views. For Comments Left Inappropriately As Answers To A Question
Teacher 15 days ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Teacher 6 weeks ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Popular Question 6 weeks ago, created a question with more than 1,000 views. For Comments Left Inappropriately As Answers To A Question
Teacher 7 weeks ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Appreciated 8 weeks ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Commentator 8 weeks ago, created a comment with at least 3 up-votes. For C: What Does 2X250Bp Buy Us?
Popular Question 3 months ago, created a question with more than 1,000 views. For Troubling Trends In Scientific Software Use
Good Answer 3 months ago, created an answer that was upvoted at least 5 times. For A: How Can I Do Principal Components Analysis ?
Scholar 3 months ago, created an answer that has been accepted. For A: How to use pygr? worldbase doesn't return anything
Appreciated 4 months ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Teacher 4 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Teacher 5 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Commentator 5 months ago, created a comment with at least 3 up-votes. For C: What Does 2X250Bp Buy Us?
Appreciated 6 months ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Teacher 6 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Good Answer 7 months ago, created an answer that was upvoted at least 5 times. For A: Generate Vcf.Gz File And Its Index File Vcf.Gz.Tbi
Teacher 7 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Teacher 8 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Scholar 9 months ago, created an answer that has been accepted. For A: How to use pygr? worldbase doesn't return anything
Teacher 10 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Appreciated 11 months ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Teacher 11 months ago, created an answer with at least 3 up-votes. For A: How To Select Only One Human Genome Build (Hg19) From The Encode Project'S Data
Good Answer 11 months ago, created an answer that was upvoted at least 5 times. For A: Not having root access sucks; installing software without root privileges

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1103 users visited in the last hour