Admin: Istvan Albert

gravatar for Istvan Albert
Istvan Albert ♦♦ 77k
Reputation:
77,330
Status:
Trusted
Location:
University Park, USA
Website:
https://www.ialbert.me/
Scholar ID:
Google Scholar Page
Last seen:
15 hours ago
Joined:
9 years ago
Email:
i************@gmail.com

I have published research works in the fields of granular matter physics, network sciencemachine learninguser interfaces and bioinformatics. But above all I like to create useful systems. I  enjoy the process of designing and implementing web based services that stand the test of time. My current project that I dedicate most my time to is an e-book on genomic data analysis:

  • The Biostar Handbook - it is modeled by the content on this site and is a comprehensive guide for beginning bioinformaticians.

I am the  maintainer of this site:

  • Biostar Q&A platform  more of a jack-of-all-trades:  lead developer, interface designer, database manager, sys admin, dev-ops etc. whatever needs to be done.

Currently I work as a  Professor of Bioinformatics at Penn State. Within that position I serve in various roles:

Posts by Istvan Albert

<prev • 4,545 results • page 1 of 455 • next >
0
votes
1
answer
186
views
1
answers
Comment: C: Sorting fq data into groups with missing barcode info?
... I was thinking that since the barcodes are not missing, perhaps a simple approach could work, something like this cat reads.fq | bioawk -c fastx '{ print $name, $seq }' | sort -k1,1 > a cat barcodes.fq | bioawk -c fastx '{ print $name, $seq }' | sort -k1,1 > b join a b > suc ...
written 15 hours ago by Istvan Albert ♦♦ 77k
1
vote
1
answer
186
views
1
answers
Answer: A: Sorting fq data into groups with missing barcode info?
... If you have all three files then you can match back the barcodes by sequence ID. I believe that the id of the sequences will match from barcode to reads since physically they are all on the same location (coordinate). Depending how many reads you have you may need to either write a simple program ...
written 15 hours ago by Istvan Albert ♦♦ 77k
0
votes
1
answer
83
views
1
answers
Answer: A: Infer direction of paired end data
... The best way to understand your data is to separate your BAM file into two, one that contains only reads from file 1 the other only reads from file 2 (basically you are breaking the pairs). samtools view -fb 64 all.bam > read1.bam samtools view -Fb 64 all.bam > read2.bam Now all ...
written 17 hours ago by Istvan Albert ♦♦ 77k
0
votes
8
answers
71k
views
8
answers
Comment: C: How to download raw sequence data from GEO/SRA
... Use fastq-dump to get your SRA data, I don't trust mirrors that much actually... fastq-dump --split-files -X 1000 SRR5959411 gives me one file, this gives two files fastq-dump --split-files -X 1000 SRR5969329 the sizes are right, as expected paired and 76bp seqkit stat *.fastq lik ...
written 18 hours ago by Istvan Albert ♦♦ 77k
1
vote
4
answers
136
views
4
answers
Answer: A: What's more clear, loops or functions?
... The problem is best formulated in terms of programming paradigms as a choice between - *imperative* (procedural) programming that uses constructs such as `if`, `for`, `while` versus - *functional* (declarative) programming that has functions such as `map`, `filter`, `reduce` (using Python as la ...
written 7 days ago by Istvan Albert ♦♦ 77k
1
vote
1
answer
409
views
1
answers
Comment: C: (solved) I couldn't reproduce the problem of max_target_seqs
... My point is that there is a responsibility that toolmakers have to properly document and make a note of situations that might lead to misunderstandings. It is insufficient to point to the documentation (fine print). Especially in the light that this was noted in 2015, most people that can be consi ...
written 9 days ago by Istvan Albert ♦♦ 77k
1
vote
1
answer
409
views
1
answers
Comment: C: (solved) I couldn't reproduce the problem of max_target_seqs
... Considering how widely BLAST is used and how often people filter on e-values no wonder that we have an irreproducibility problem in life sciences ...
written 9 days ago by Istvan Albert ♦♦ 77k
0
votes
1
answer
409
views
1
answers
Comment: C: (solved) I couldn't reproduce the problem of max_target_seqs
... Recommendation: show the alignments as text and not images also remove the unnecessary data listings, those make the post difficult to follow ...
written 9 days ago by Istvan Albert ♦♦ 77k
0
votes
1
answer
409
views
1
answers
Comment: C: I couldn't reproduce the problem of max_target_seqs
... now that I looked at it in more detail there is more to it, the "bug" or "behavior" is triggered only in some special cases. > In some cases a final HSP will improve enough in the later gapped phase to rise to the top hits. You have to have an alignment that matches the statement above. Look ...
written 12 days ago by Istvan Albert ♦♦ 77k
0
votes
1
answer
409
views
1
answers
Comment: C: I couldn't reproduce the problem of max_target_seqs
... now that I read the report a little better - I want to mention that my example here may not trigger the bug. As the response from [NCBI reply states](https://gist.github.com/sujaikumar/504b3b7024eaf3a04ef5) > In some cases a final HSP will improve enough in the later gapped phase to rise to the ...
written 12 days ago by Istvan Albert ♦♦ 77k

Latest awards to Istvan Albert

Great Question 1 day ago, created a question with more than 5,000 views. For Heng Li of BWA and Samtools uses this
Epic Question 2 days ago, created a question with more than 10,000 views. For Hadley Wickham of ggplot and RStudio uses this
Good Answer 2 days ago, created an answer that was upvoted at least 5 times. For A: Where To Look For Quality Bioinformatics Short Courses And Workshops?
Scholar 13 days ago, created an answer that has been accepted. For A: How Do I Convert From Bed Format To Gff Format?
Teacher 15 days ago, created an answer with at least 3 up-votes. For A: How To Grep Largest Contig From A Multi Fasta File
Librarian 7 weeks ago, created a post with more than 10 bookmarks. For Table Of Contents To All Review Paper Compilations On Biostar
Great Question 10 weeks ago, created a question with more than 5,000 views. For Heng Li of BWA and Samtools uses this
Teacher 10 weeks ago, created an answer with at least 3 up-votes. For A: How To Find Unique Reads?
Teacher 12 weeks ago, created an answer with at least 3 up-votes. For A: Extracting Reads That Are Properly Paired From Bam File
Popular Question 3 months ago, created a question with more than 1,000 views. For Jonathan Pevsner author of Bioinformatics and Functional Genomics uses this
Great Question 3 months ago, created a question with more than 5,000 views. For Hadley Wickham of ggplot and RStudio uses this
Epic Question 3 months ago, created a question with more than 10,000 views. For Hadley Wickham of ggplot and RStudio uses this
Great Question 3 months ago, created a question with more than 5,000 views. For Hadley Wickham of ggplot and RStudio uses this
Great Question 4 months ago, created a question with more than 5,000 views. For Hadley Wickham of ggplot and RStudio uses this
Commentator 4 months ago, created a comment with at least 3 up-votes. For C: A Farewell To Bioinformatics
Commentator 4 months ago, created a comment with at least 3 up-votes. For C: Mapping God Found ‘Scientifically Dishonest’ By Anonymous Peer Reviewers
Epic Question 4 months ago, created a question with more than 10,000 views. For Annovar: Functional Annotation Of Genetic Variants From High-Throughput Sequencing Data
Appreciated 4 months ago, created a post with more than 5 votes. For Jonathan Pevsner author of Bioinformatics and Functional Genomics uses this

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1763 users visited in the last hour