truelearn.preprocessing
.Wikifier#
- class truelearn.preprocessing.Wikifier(api_key: str)[source]#
Bases:
object
A client that makes requests to the Wikifier API. See https://www.wikifier.org/.
Methods
__init__
(api_key)Init Wikifier class with api_key.
wikify
(text, *[, df_ignore, words_ignore, ...])Annotate input text using the Wikifier API.
- __init__(api_key: str) None [source]#
Init Wikifier class with api_key.
- Parameters:
api_key – A string representing the API key needed to make the request. Get one from https://wikifier.org/register.html.
- wikify(text: str, *, df_ignore: int = 50, words_ignore: int = 50, top_n: Optional[int] = None, key_fn: str = 'cosine') List[Dict[str, Optional[Union[str, float]]]] [source]#
Annotate input text using the Wikifier API.
- Parameters:
text – A string representing the text to annotate.
* – Use to reject other positional arguments.
df_ignore – An int representing the nTopDfValuesToIgnore value from the Wikifier API, used to ignore frequently-occurring words.
words_ignore – An int representing the nWordsToIgnoreFromList from the Wikifier API, also used to ignore frequently-occurring words.
top_n – The number of annotations to return, e.g. top_n = 5 would only return the top 5 annotations sorted by keys extracted via key_fn. If None, return all the annotations.
key_fn – A string representing the key function that is used when sorting the annotations. The allowed values are “cosine” and “pagerank”. “cosine” means sorted by cosine similarity. “pagerank” means sorted by pagerank.
- Returns:
The list of annotations obtained from the Wikifier API. An annotation is a dictionary containing five keys: “title”, “url”, “cosine”, “pageRank”, and “wikiDataItemId”.
- Raises:
WikifierError –
The API key is not valid. 2) The response from Wikifier contains an error message.
TrueLearnValueError –
The key_fn is neither cosine nor pagerank. 2) The df_ignore or words_ignore is less than 0.
urllib.error.HTTPError – The HTTP request returns a status code representing an error.