About

API

You can access the system without any limitations using the following endpoint: http://www.serelex.org/MODEL/WORD. The list of available models is presented below:

English PatternSim model.
French PatternSim model.
Russian PatternSim model.
Russian SkipGram model also known as the Russian Distributional Thesaurus, RDT).
Portugese PatternSim model.

The sample English request mentioned above should return a list of words related to 'ubuntu' in JSON format:


{
  "word": "ubuntu",
  "model": "norm60-corpus-all",
  "relations": [
    {
      "word": "debian",
      "value": 0.848965058911512,
      "icon": false
    },
    {
      "word": "mandriva",
      "value": 0.498265193741008,
      "icon": false
    },
    {
      "word": "fedora",
      "value": 0.333650822618753,
      "icon": false
    },
    {
      "word": "gentoo",
      "value": 0.333297162942371,
      "icon": false
    },
    {
      "word": "opensuse",
      "value": 0.232300240955515,
      "icon": false
    },
...
}

Graphs of related words

You can download distributional thesauri (i.e. graphs of semantically related words) that are used as the basis of the current system in the CSV format: "word_i<TAB>word_j<TAB>similarity_ij", where "word_i" and "word_j" are words, such as ""Python" and "Ruby" and "similarity_ij" is their similarity score e.g. 0.789.

English PatternSim model extracted from the combination of ukWaC and Wikipedia corpora. You can also download raw PatternSim features derived from a larger corpus (a 59G combination of Wikipedia, ukWaC, GigaWord, and Leipzig news corpus).
French PatternSim model extracted from the frWaC and Wikipedia corpora. You can also download raw PatternSim features of this model.
Russian PatternSim model: based on DBpedia abstracts, based on DBpedia abstracts and ruWaC.
Russian SkipGram model aka Russian Distributional Thesaurus (RDT)

Source code

If you would like to extract word similarity graphs from your own corpus you can use the PatternSim tool. Source code for evaluation of such graphs is also available (the sim-eval tool). Finally, you can run your own instance of the backend for improved performance if you use the API from your application.

PatternSim: a pattern-based semantic relatednes measure used to extract word graphs used in Serelex from text corpora.
Serelex lexical-semantic search engine (this web site).
sim-eval: tools for evaluation of semantic similarity/relatedness measures. More informaton about this tool.