Hi everyone. The new version of Rfam v12 has been released the last friday.
I have several doubts, I read the "Readme" though. I wonder if someone can explain slightly how you handle to make the differents steps. How do you
face the problem?
For example, How do you starts? Do you use the last script from Rfam v11 rfam_scan.pl)? Or you download the whole rfamseq.txt? I wonder how you find the Rfam.fasta if they are not avaible? If someone can point out some little or huge advise I will appreciate.
I point out the differents parts of the "Readme".
(...) (3) As of Rfam 12.0, we no longer provide FULL alignments for each family. As the size of our full alignments grew, the overheads involved in creating, storing and manipulating them became too great to support. Instead, we provide full region lists, which contain the ENA sequence accession, start/end coordinates and bitscore for each hit to a family. If you wish to build a FULL alignment equivalent to those supplied in previous releases of Rfam, you may do so by downloading the CM for a given family and the Rfam sequence database, RFAMSEQ (or indeed you may choose to use your own set of sequences). This means that some sections of the website are no longer available, such as the option to download or vie the full alignment for a given family. (...) 7) Due to the increasing size of the nucleotide sequence databases and the resulting increase in the size of our alignments we are now unable to provide complete sequence alignments and trees for our 5 largest families tRNA (RF0005), SSU (RF00177, RF01959, RF0160) and ultra conserved element uc_338 (RF02271). For these families we have provided a full alignment that is composed of SEED and genome sequences only. The entries for these families in the files: Rfam.fasta, Rfam.full and Rfam_full.tree are based on these reduced genome alignments. We do however provide a fasta file containing the complete WGS+STD annotations for each family on our ftp site (see below for release files). The number of sequences annotated in the reduced genome alignments and complete WGS_STD alignments: genome_alignment WGS_STD_alignment RF00005 298470 2106268 RF00177 7429 744528 RF01959 7394 881056 RF01960 425 65901 RF02271 857 229907 (...) 4. FILES As of Rfam 12.0 --------------- README - this file COPYING - some legal things USERMAN - a description of the Rfam flatfile formats Rfam.tar.gz - a concatenated set of Rfam covariance models in ascii INFERNAL 1.1 format Rfam.seed.gz - annotated seed alignments in STOCKHOLM format Rfam.full_region.gz - list of sequences which make up the full family membership for each family. Fields are as follows: 1. RF00001 is the Rfam accession 2. EU093378.1 is the EMBL accession and version number 3. Start coordinate of match on sequence 4. End coordinate of match on sequence 5. Bitscore 6. E-value 7. CM start position 8. CM end position 9. If match is a truncated match to CM, this field is 1 10. Type is either seed or full Rfam.seed_tree.tar.gz - annotated tree files for each seed alignment [tarbomb] Rfam.pdb.gz - tab delimited mappings of pdb seqs to Rfam families. database_files: alignment_and_tree.txt.gz clan.txt.gz clan_database_link.txt.gz clan_literature_reference.txt.gz clan_membership.txt.gz database_link.txt.gz db_version.txt.gz dead_clan.txt.gz dead_family.txt.gz family.txt.gz family_literature_reference.txt.gz family_ncbi.txt.gz features.txt.gz full_region.txt.gz html_alignment.txt.gz keywords.txt.gz literature_reference.txt.gz matches_and_fasta.txt.gz motif.txt.gz motif_database_link.txt.gz motif_family_stats.txt.gz motif_file.txt.gz motif_literature.txt.gz motif_matches.txt.gz motif_pdb.txt.gz motif_ss_image.txt.gz pdb_full_region.txt.gz rfamseq.txt.gz secondary_structure_image.txt.gz seed_region.txt.gz sunburst.txt.gz tables.sql taxonomy.txt.gz taxonomy_websearch.txt.gz version.txt.gz (...)