Hi everyone. The new version of Rfam v12 has been released the last friday.
ftp://ftp.ebi.ac.uk/pub/databases/Rfam/12.0/
http://rfam.xfam.org/help#tabview=tab0
I have several doubts, I read the "Readme" though. I wonder if someone can explain slightly how you handle to make the differents steps. How do you face the problem?
For example, How do you starts? Do you use the last script from Rfam v11 (rfam_scan.pl)? Or you download the whole rfamseq.txt? I wonder how you find the Rfam.fasta if they are not avaible? If someone can point out some little or huge advise I will appreciate.
I point out the differents parts of the "Readme".
(...)
(3) As of Rfam 12.0, we no longer provide FULL alignments for each family. As
the size of our full alignments grew, the overheads involved in creating,
storing and manipulating them became too great to support. Instead, we provide
full region lists, which contain the ENA sequence accession, start/end
coordinates and bitscore for each hit to a family. If you wish to build a FULL
alignment equivalent to those supplied in previous releases of Rfam, you may do
so by downloading the CM for a given family and the Rfam sequence database,
RFAMSEQ (or indeed you may choose to use your own set of sequences). This means
that some sections of the website are no longer available, such as the option to
download or vie the full alignment for a given family.
(...)
7) Due to the increasing size of the nucleotide sequence databases
and the resulting increase in the size of our alignments we are
now unable to provide complete sequence alignments and trees for
our 5 largest families tRNA (RF0005), SSU (RF00177, RF01959,
RF0160) and ultra conserved element uc_338 (RF02271). For these
families we have provided a full alignment that is composed of
SEED and genome sequences only. The entries for these families in
the files: Rfam.fasta, Rfam.full and Rfam_full.tree are based on
these reduced genome alignments. We do however provide a fasta
file containing the complete WGS+STD annotations for each family
on our ftp site (see below for release files). The number of
sequences annotated in the reduced genome alignments and complete
WGS_STD alignments:
genome_alignment WGS_STD_alignment
RF00005 298470 2106268
RF00177 7429 744528
RF01959 7394 881056
RF01960 425 65901
RF02271 857 229907
(...)
4. FILES
As of Rfam 12.0
---------------
README - this file
COPYING - some legal things
USERMAN - a description of the Rfam flatfile formats
Rfam.tar.gz - a concatenated set of Rfam covariance models in ascii INFERNAL 1.1 format
Rfam.seed.gz - annotated seed alignments in STOCKHOLM format
Rfam.full_region.gz - list of sequences which make up the full family
membership for each family. Fields are as follows:
1. RF00001 is the Rfam accession
2. EU093378.1 is the EMBL accession and version number
3. Start coordinate of match on sequence
4. End coordinate of match on sequence
5. Bitscore
6. E-value
7. CM start position
8. CM end position
9. If match is a truncated match to CM, this field is 1
10. Type is either seed or full
Rfam.seed_tree.tar.gz - annotated tree files for each seed alignment [tarbomb]
Rfam.pdb.gz - tab delimited mappings of pdb seqs to Rfam families.
database_files:
alignment_and_tree.txt.gz
clan.txt.gz
clan_database_link.txt.gz
clan_literature_reference.txt.gz
clan_membership.txt.gz
database_link.txt.gz
db_version.txt.gz
dead_clan.txt.gz
dead_family.txt.gz
family.txt.gz
family_literature_reference.txt.gz
family_ncbi.txt.gz
features.txt.gz
full_region.txt.gz
html_alignment.txt.gz
keywords.txt.gz
literature_reference.txt.gz
matches_and_fasta.txt.gz
motif.txt.gz
motif_database_link.txt.gz
motif_family_stats.txt.gz
motif_file.txt.gz
motif_literature.txt.gz
motif_matches.txt.gz
motif_pdb.txt.gz
motif_ss_image.txt.gz
pdb_full_region.txt.gz
rfamseq.txt.gz
secondary_structure_image.txt.gz
seed_region.txt.gz
sunburst.txt.gz
tables.sql
taxonomy.txt.gz
taxonomy_websearch.txt.gz
version.txt.gz
(...)
Hello margxenscienculo!
It appears that your post has been cross-posted to another site: SEQanswers.
This is typically not recommended as it runs the risk of annoying people in both communities.
Ok. Better annoying only one. :-/
I think you misunderstand the problem. It's not that the question itself is annoying, but that posting it twice may double the workload of people that are trying to help you. See Rule 8. Be Courteous to Other Forum Members, most relevantly: