Bioinformatics Predictions of Localization and Targeting
One of the major challenges in the post-genomic era with hundreds of genomes sequenced is the annotation of protein structure and function. Computational predictions of subcellular localization are an important step toward this end. The development of computational tools that predict targeting and localization has, therefore, been a very active area of research, in particular since the first release of the groundbreaking program PSORT in 1991. The most reliable means of annotating protein structure and function remains homology-based inference, i.e. the transfer of experimental annotations from one protein to its homologs. However, annotations about localization demonstrate how much can be gained from advanced machine learning: more proteins can be annotated more reliably. Contemporary computational tools for the annotation of protein targeting include automatic methods that mine the textual information from the biological literature and molecular biology databases. Some machine learning-based methods that accurately predict features of sorting signals and that use sequence-derived features to predict localization have reached remarkable levels of performance. Sustained prediction accuracy has increased by more than 30 percentage points over the last decade. Here, we review some of the most recent methods for the prediction of subcellular localization and protein targeting that contributed toward this breakthrough.