truelearn.preprocessing.Wikifier#

class truelearn.preprocessing.Wikifier(api_key: str)[source]#

Bases: object

A client that makes requests to the Wikifier API. See https://www.wikifier.org/.

Methods

__init__(api_key)

Init Wikifier class with api_key.

wikify(text, *[, df_ignore, words_ignore, ...])

Annotate input text using the Wikifier API.

__init__(api_key: str) None[source]#

Init Wikifier class with api_key.

Parameters:

api_key – A string representing the API key needed to make the request. Get one from https://wikifier.org/register.html.

wikify(text: str, *, df_ignore: int = 50, words_ignore: int = 50, top_n: Optional[int] = None, key_fn: str = 'cosine') List[Dict[str, Optional[Union[str, float]]]][source]#

Annotate input text using the Wikifier API.

Parameters:
  • text – A string representing the text to annotate.

  • * – Use to reject other positional arguments.

  • df_ignore – An int representing the nTopDfValuesToIgnore value from the Wikifier API, used to ignore frequently-occurring words.

  • words_ignore – An int representing the nWordsToIgnoreFromList from the Wikifier API, also used to ignore frequently-occurring words.

  • top_n – The number of annotations to return, e.g. top_n = 5 would only return the top 5 annotations sorted by keys extracted via key_fn. If None, return all the annotations.

  • key_fn – A string representing the key function that is used when sorting the annotations. The allowed values are “cosine” and “pagerank”. “cosine” means sorted by cosine similarity. “pagerank” means sorted by pagerank.

Returns:

The list of annotations obtained from the Wikifier API. An annotation is a dictionary containing five keys: “title”, “url”, “cosine”, “pageRank”, and “wikiDataItemId”.

Raises:
  • WikifierError

    1. The API key is not valid. 2) The response from Wikifier contains an error message.

  • TrueLearnValueError

    1. The key_fn is neither cosine nor pagerank. 2) The df_ignore or words_ignore is less than 0.

  • urllib.error.HTTPError – The HTTP request returns a status code representing an error.