Moderator: Matt Shirley

gravatar for Matt Shirley
Matt Shirley7.5k
Reputation:
7,500
Status:
Trusted
Location:
Cambridge, MA
Website:
http://mattshirley.com/
Twitter:
mdshw5
Scholar ID:
Google Scholar Page
Last seen:
13 hours ago
Joined:
6 years, 3 months ago
Email:
m*****@gmail.com

Posts by Matt Shirley

<prev • 656 results • page 1 of 66 • next >
3
votes
3
answers
167
views
3
answers
Comment: C: Extract nucleotides from fasta file by a single position
... It's good to do the file parsing yourself, especially as a learning exercise. I think for production or publication it's nice to use parsers that are tested and hopefully eliminate a broad class of simple bugs. ...
written 4 days ago by Matt Shirley7.5k
2
votes
3
answers
167
views
3
answers
Comment: C: Extract nucleotides from fasta file by a single position
... Just a suggestion for simplifying this process: you don't need to linearize the fasta file, and you don't need to read the whole thing into memory - *especially* if you're just extracting a few single nucleotides. https://gist.github.com/mdshw5/d1e4af5ef83a04630175f0dcc0638b6a ...
written 4 days ago by Matt Shirley7.5k
1
vote
3
answers
167
views
3
answers
Answer: A: Extract nucleotides from fasta file by a single position
... You can format the table like UCSC region strings (chr:start-end) which are 1-based closed coordinates. This way you don't have to worry about converting to [0,1) half-open coordinates like BED requires. Then you can pipe the region strings to either `samtools faidx` or `faidx` from the pyfaidx pack ...
written 4 days ago by Matt Shirley7.5k
0
votes
3
answers
140
views
3
answers
Comment: C: How do I get a list of all species with a fasta genome in UCSC?
... The [genomepy](https://github.com/simonvh/genomepy) python library uses this method for [listing UCSC genomes](https://github.com/simonvh/genomepy/blob/master/genomepy/provider.py#L386). ...
written 5 days ago by Matt Shirley7.5k
3
votes
3
answers
140
views
3
answers
Answer: A: How do I get a list of all species with a fasta genome in UCSC?
... You can get an XML listing of all the current genomes from their DAS server: http://genome.ucsc.edu/cgi-bin/das/dsn $ curl -s http://genome.ucsc.edu/cgi-bin/das/dsn | head -n20 Dec. 2013 (GRCh38/hg38) at UCSC Human Dec. 2013 (GRCh38/hg38) Genome at UCSC ...
written 5 days ago by Matt Shirley7.5k
0
votes
6
answers
1.4k
views
6
answers
Comment: C: Stranger Things: unexpected limitations of popular tools
... I guess I just don't see too much of a problem with record-based formats. If you need to efficiently index columns, you'll need a database. For most unix tools you're relying on readline and so any operation on columns has to read the entire row anyway. With a record-based format you're at least val ...
written 6 days ago by Matt Shirley7.5k
0
votes
6
answers
1.4k
views
6
answers
Comment: C: Stranger Things: unexpected limitations of popular tools
... Can't you achieve most of what you propose by just adding user-defined SAM tags? These are similar to "optional" columns, and have the benefit of namespacing and static typing. I guess one big loss here would be the ability to naively sort the file on tags, but sorting shouldn't be too hard, and wou ...
written 6 days ago by Matt Shirley7.5k
0
votes
0
answers
129
views
0
answers
Comment: C: Use of if-else statement in snakemake rule
... if {wildcards.variant_status} == 'Somatic': should be if wildcards.variant_status == 'Somatic': The `run:` statement interprets everything in the subsequent block as python code, so the `{}` used for templating variables in strings are no longer necessary. ...
written 7 days ago by Matt Shirley7.5k
2
votes
6
answers
1.4k
views
6
answers
Comment: C: Stranger Things: unexpected limitations of popular tools
... The purpose of SRA format is to have a column-oriented datastore where each column can have a separate datatype-specific compression applied. Sequences, sequence names, quality scores, alignment information... all get compressed more efficiently than just gzipping all the data. Rows in SRA are indiv ...
written 7 days ago by Matt Shirley7.5k
0
votes
2
answers
143
views
2
answers
Answer: A: Extact specific sequence from a fasta file into multiple files using ID and EC f
... If you really want to use `Biopython` feel free, but here's a solution using `pyfaidx`, which saves you the trouble of reading the entire file. https://gist.github.com/mdshw5/b2073cf96f7ef2e54ef3f09495d37f82 ...
written 9 days ago by Matt Shirley7.5k

Latest awards to Matt Shirley

Commentator 4 days ago, created a comment with at least 3 up-votes. For C: What Does 2X250Bp Buy Us?
Teacher 4 days ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Scholar 5 days ago, created an answer that has been accepted. For A: How to use pygr? worldbase doesn't return anything
Appreciated 19 days ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Popular Question 28 days ago, created a question with more than 1,000 views. For On the utility of publishing a tool paper
Good Answer 6 weeks ago, created an answer that was upvoted at least 5 times. For A: How Can I Do Principal Components Analysis ?
Appreciated 6 weeks ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Appreciated 6 weeks ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Teacher 7 weeks ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Scholar 3 months ago, created an answer that has been accepted. For A: How to use pygr? worldbase doesn't return anything
Popular Question 3 months ago, created a question with more than 1,000 views. For Comments Left Inappropriately As Answers To A Question
Teacher 3 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Teacher 4 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Popular Question 4 months ago, created a question with more than 1,000 views. For Comments Left Inappropriately As Answers To A Question
Teacher 4 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Appreciated 4 months ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Commentator 4 months ago, created a comment with at least 3 up-votes. For C: What Does 2X250Bp Buy Us?
Popular Question 6 months ago, created a question with more than 1,000 views. For Troubling Trends In Scientific Software Use
Good Answer 6 months ago, created an answer that was upvoted at least 5 times. For A: How Can I Do Principal Components Analysis ?
Scholar 6 months ago, created an answer that has been accepted. For A: How to use pygr? worldbase doesn't return anything
Appreciated 7 months ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing
Teacher 7 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Teacher 8 months ago, created an answer with at least 3 up-votes. For A: What Does 2X250Bp Buy Us?
Commentator 8 months ago, created a comment with at least 3 up-votes. For C: What Does 2X250Bp Buy Us?
Appreciated 9 months ago, created a post with more than 5 votes. For A: Ways To Detect Bias In Dna Sampling For Genomic Sequencing

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 908 users visited in the last hour