Data collection
All physical interactions were downloaded from IntAct, BioGRID, DIP, MINT, MatrixDB and InnateDB. These were combined to form a non-redundant dataset. Species with less than 10 interactions were removed.
All non-physical interactions, such as genetic interactions, were excluded.
Interactions of proteins from different species were also excluded.
All proteins were assigned valid UniProt IDs. In cases where UniProt IDs could not be directly assigned, protein sequences were aligned to UniProtKB using BLAST and the ID of the longest hit with 99% sequence identity was used.
Interactions in which one or both of the proteins were no longer present in UniProt, were removed.
Interactions of proteins that did not map to valid UniProt IDs were removed. Interactions were excluded only after manual confirmation.
The interacting proteins were annotated with Entrez gene IDs, Ensembl IDs, Pfam domains and Gene Ontology terms.
Reliability scoring
From version 4, all interactions in HitPredict (small-scale or high-throughput) are assigned an Interaction score. The interaction score denotes the reliability of the interaction and is the geometric mean of the following two methods:
Annotation-based Score This score is calculated in the form of a likelihood ratio using naive Bayesian networks and is based on the following properties of the interacting proteins:
Likelihood ratio greater than 1 is scaled to give an annotation score between 0.5 and 1. An annotation score >= 0.5 indicates a high confidence interaction. The value of the annotation score increases with the evidence supporting the interaction.
Method-based Score This score is based on the experimental information available for the interactions and is calculated as the mean of the following three scores:
These scores are calculated and combined into a single score using the method shown in Villaveces et al., Database, 2015. A method score >= 0.485 is considered to indicate high confidence. This cut-off is suggested by Villaveces et al.
Combined Interaction Score This score is the geometric mean of the Annotation-based score and the Method-based score. As such, it takes into account the experimental support for the interaction as well as the genomic features of the interacting proteins. This score has been shown to have a better performance than either of the two scores as well as the score used by the Mentha database (same as that used by MINT).
Please refer to Lopez et al., Database 2015 for more details.
Service provided by Combinatics, Japan
Last updated: 15 Jul 2024