Data collection

  • All physical interactions were downloaded from IntAct, BioGRID, DIP, MINT, MatrixDB and InnateDB. These were combined to form a non-redundant dataset. Species with less than 10 interactions were removed.

  • All non-physical interactions, such as genetic interactions, were excluded.

  • Interactions of proteins from different species were also excluded.

  • All proteins were assigned valid UniProt IDs. In cases where UniProt IDs could not be directly assigned, protein sequences were aligned to UniProtKB using BLAST and the ID of the longest hit with 99% sequence identity was used.

  • Interactions in which one or both of the proteins were no longer present in UniProt, were removed.

  • Interactions of proteins that did not map to valid UniProt IDs were removed. Interactions were excluded only after manual confirmation.

  • The interacting proteins were annotated with Entrez gene IDs, Ensembl IDs, Pfam domains and Gene Ontology terms.


Reliability scoring

From version 4, all interactions in HitPredict (small-scale or high-throughput) are assigned an Interaction score. The interaction score denotes the reliability of the interaction and is the geometric mean of the following two methods:

  • Annotation-based Score This score is calculated in the form of a likelihood ratio using naive Bayesian networks and is based on the following properties of the interacting proteins:

    1. Structurally known interacting Pfam domains obtained - 3DID
    2. Gene Ontology (GO) annotations of the interacting proteins - GO
    3. Homologous interactions - calculated in HitPredict based on the region of similarity, e-value, score and percentage of similarity with the query proteins as shown in the alignment plot.
  • Likelihood ratio greater than 1 is scaled to give an annotation score between 0.5 and 1. An annotation score >= 0.5 indicates a high confidence interaction. The value of the annotation score increases with the evidence supporting the interaction.

  • Method-based Score This score is based on the experimental information available for the interactions and is calculated as the mean of the following three scores:

    1. Publication score: Score based on the number of unique publications or experiments supporting the interaction
    2. Method score: Score based on the following methods of interaction identification - biophysical, protein complementation assay, post transcriptional inference, biochemical, imaging technique and their subtypes. The default scores for each method are used as specified by the HUPO PSI-MI consortium.
    3. Type score: Score based on the following interaction types and their subtypes - association, physical association and direct interaction. The default scores for each type are used as specified by the HUPO PSI-MI consortium.

    These scores are calculated and combined into a single score using the method shown in Villaveces et al., Database, 2015. A method score >= 0.485 is considered to indicate high confidence. This cut-off is suggested by Villaveces et al.

  • Combined Interaction Score This score is the geometric mean of the Annotation-based score and the Method-based score. As such, it takes into account the experimental support for the interaction as well as the genomic features of the interacting proteins. This score has been shown to have a better performance than either of the two scores as well as the score used by the Mentha database (same as that used by MINT).

    Please refer to Lopez et al., Database 2015 for more details.


Service provided by Combinatics, Japan

Last updated: 15 Jul 2024