API

You can access the system without any limitations using the following endpoint: http://www.serelex.org/MODEL/WORD. The list of available models is presented below: The sample English request mentioned above should return a list of words related to 'ubuntu' in JSON format:

{
  "word": "ubuntu",
  "model": "norm60-corpus-all",
  "relations": [
    {
      "word": "debian",
      "value": 0.848965058911512,
      "icon": false
    },
    {
      "word": "mandriva",
      "value": 0.498265193741008,
      "icon": false
    },
    {
      "word": "fedora",
      "value": 0.333650822618753,
      "icon": false
    },
    {
      "word": "gentoo",
      "value": 0.333297162942371,
      "icon": false
    },
    {
      "word": "opensuse",
      "value": 0.232300240955515,
      "icon": false
    },
...
}

Graphs of related words

You can download distributional thesauri (i.e. graphs of semantically related words) that are used as the basis of the current system in the CSV format: "word_i<TAB>word_j<TAB>similarity_ij", where "word_i" and "word_j" are words, such as ""Python" and "Ruby" and "similarity_ij" is their similarity score e.g. 0.789.

Source code

If you would like to extract word similarity graphs from your own corpus you can use the PatternSim tool. Source code for evaluation of such graphs is also available (the sim-eval tool). Finally, you can run your own instance of the backend for improved performance if you use the API from your application.