Overview of prediction results used for classification.
Plus; positive prediction result obligatory; minus, negative prediction result obligatory; ~, prediction result has no influence; +1, one of the predictions labelled as such has to be positive.
Prediction of protein structure and location
Signal peptide predictions were performed at the the SignalP 3.0 server and the TargetP 1.1 server. If both or only one of the two servers predicted the protein to be secreted, it was classified as containing a signal peptide. Alpha-helical transmembrane regions were investigated using the TMHMM 2.0 server. Further information about putative protein location was obtained from the cPsortdb database. Four different servers were used for the identification of putative beta-barrel structures; the Beta-barrel Outer Membrane protein Predictor (BOMP), the Prediction of TransMembrane Beta-Barrel Proteins server (PRED-TMBB), the Markov Chain Model for Beta Barrels prediction program (MCMBB) and the B2TMR-HMM predictor. For PRED-TMBB, the predictions were performed using the Viterbi and Posterior Decoding algorithms. The probability of the proteins to form a beta-helix was investigated with BetaWrap, and results obtaining a p-value <0.01 were counted as positive. Lipoproteins were predicted using the LipoP 1.0 server, and outer membrane location of the lipoproteins was decided as described by Seydel and coworkers. Similarity values and allocations to clusters of orthologous groups were obtained by blastp analysis at the NCBI web site. Additional information about individual proteins was obtained from the PEDANT database. For a detailed description of the prediction approach please see flow chart (click on the image to enlarge).
Identification of orthologous clusters and phylogenetic analysis
The Similarity Matrix of Proteins (SIMAP) database  provides a precalculated sequence similarity matrix for all proteins deposited at major public sequence databases. For the formation of orthologous clusters, bidirectional best hits (BBHs) with an E-value cut-off of 1-08 and a length ratio cut-off of 0.5 were grouped. All chlamydiae (including the yet unfinished genomes of Parachlamydia acanthamoebae UV7, Simkania negevensis Z, and Waddlia chondrophila 2032/99; ingroup 1) or a selection of Proteobacteria including E. coli K12 (ingroup 2) were considered as "ingroup" organisms in our analysis, respectively, whereas 438 and 427 representatives of other bacterial lineages were considered "outgroup" organisms, respectively; for a detailed list of ingroup and outgroup organisms see Table S7. First, BBHs between proteins from ingroup organisms were merged to form one cluster if they shared at least one protein. Subsequently, outgroup proteins with BBHs to ingroup proteins were added to the clusters. As a last step, in-paralogues (i.e. paralogues that arose after diversification ) were added if they showed a higher similarity to a protein from the same organism than to proteins from other ingroup organisms.