User: tomc

gravatar for tomc
tomc80
Reputation:
80
Status:
Trusted
Location:
United States
Last seen:
1 year, 7 months ago
Joined:
4 years, 3 months ago
Email:
t***@cs.uoregon.edu

Posts by tomc

<prev • 9 results • page 1 of 1 • next >
0
votes
1
answer
932
views
1
answers
Answer: A: Clustering different RBH hits
... with your first filter as an example awk 'BEGIN{print "OFAS000562-RA-EXON01"}$3=="OFAS000562-RA-EXON01"{print $1}' < file0 > file1 more generically make your query a variable gene=OFAS000562-RA-EXON01 awk -vGENE=${gene}" 'BEGIN{print GENE}$3==GENE{print $1}' < file0 > $ ...
written 3.2 years ago by tomc80
3
votes
1
answer
1.2k
views
1
answers
Answer: A: How do I replace a value used in script with a range of values in a file?
... You can pass your script an arbitrary variable say 'LIMIT' with > script.awk -v "LIMIT=73" chr1.bedgraph then inside the script use > if(s >=LIMIT) To get the sequence of limits from your second file (values.txt), filter the first column out (assuming tab separated otherwise spe ...
written 3.4 years ago by tomc80
0
votes
3
answers
2.3k
views
3
answers
Answer: A: Extract subset sequences from fasta file
... who knows, it might work beyond your sample   #! /usr/bin/awk -f BEGIN {FS=";"} /\>.*; loc=2[LR];.*/ {     for(i=1;i<=NF;i++){         split($i,a,"=");         defl[a[1]]=a[2]     }     if(300<defl[" length"]){         record=$0;         while(getline){             if(NF<2){record = r ...
written 4.1 years ago by tomc80
0
votes
1
answer
1.7k
views
1
answers
Answer: A: how to find gene IDs from sequences
... Assuming you are working with nucleotides, have a local blast installation and your gene reference sequences formatted as a blast database, This could be start of a solution to creating links between your sequences to and known gene IDs. blastn -db reference.bdb -query file_with all sequences.nt - ...
written 4.2 years ago by tomc80
0
votes
4
answers
1.4k
views
4
answers
Answer: A: Database of graphs(networks) arising in bioinformatics
... The go:term stanzas in the example would be nodes, they contains various properties/attributes including elements that contain "<... rdf:resource ...>" which are edges to other nodes. All the nodes are identified with URIs  (they look just like URLs but need not lead to a page) You might be ...
written 4.2 years ago by tomc80
0
votes
1
answer
1.2k
views
1
answers
Answer: A: Make pairwise fasta file for two species
... I can not address address if there is a standard way to do this. It is unclear to me if the sequences for the two species are in in the same fasta file or each in their own file.  Unfortunately one list of arbitrary pairs (orthologs)  cannot be sorted in two ways simultaneously so you will need to ...
written 4.2 years ago by tomc80
2
votes
2
answers
1.6k
views
2
answers
Answer: A: Count of GC in row, but not from N bases
... Normalized GC content per row sans N awk '{gsub("N","");t=length();gsub(/[GC]/,"");print int((t-length())/t*100)/100}' or, assumes sequence symbols are strictly ACTGN. awk '{gsub("N","");t=length();gsub(/[AT]/,"");print int(length()/t*100)/100}' ...
written 4.2 years ago by tomc80
1
vote
8
answers
18k
views
8
answers
Answer: A: how to unzip the files in batch?
... Or don't. pass them compressed if the tool handles it ,or decompress on the fly i.e. zcat file.gz | whatever_tool there are a bunch of utilities for processing gzip without explicitly writing out the unzipped file http://www.nongnu.org/zutils/manual/zutils_manual.html     ...
written 4.2 years ago by tomc80
2
votes
7
answers
11k
views
7
answers
Answer: A: Remove duplicates in fasta file based on ID
... Assuming dup.fasta has one '>defline' followed by one line of sequence awk '/^>/{id=$0;getline;arr[id]=$0}END{for(id in arr)printf("%s\n%s\n",id,arr[id])}' dup.fasta > uniq.fasta there is no requirement duplicates be adjacent, there is no guarantee the output order is related to the inpu ...
written 4.2 years ago by tomc80

Latest awards to tomc

Teacher 3.4 years ago, created an answer with at least 3 up-votes. For A: How do I replace a value used in script with a range of values in a file?
Scholar 4.2 years ago, created an answer that has been accepted. For A: Count of GC in row, but not from N bases

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour