truelearn.datasets.load_peek_dataset#

truelearn.datasets.load_peek_dataset(*, dirname: str = '.', variance: float = 1e-09, kc_init_func: ~truelearn.datasets._peek.PEEKKnowledgeComponentGenerator = <class 'truelearn.models._knowledge.KnowledgeComponent'>, train_limit: ~typing.Optional[int] = None, test_limit: ~typing.Optional[int] = None, verbose: bool = True) Tuple[List[Tuple[int, List[Tuple[EventModel, bool]]]], List[Tuple[int, List[Tuple[EventModel, bool]]]], Dict[int, Tuple[str, str, str]]][source]#

Download and Parse PEEKDataset.

Examples

To load the data:

>>> from truelearn.datasets import load_peek_dataset
>>> train, test, mapping = load_peek_dataset(verbose=False)
>>> len(train)
14050
>>> train[0]  
(23128, [(EventModel(...), event_time=172.0), False), ..., (EventModel(...), event_time=55932.0), False)])
>>> len(test)
5969
>>> test[0]  
(25623, [(EventModel(...), event_time=0.0), False), ..., (EventModel(...), event_time=1590.0), False)])
>>> len(mapping)
30367
>>> mapping[0]
('https://en.wikipedia.org/wiki/"Hello,_World!"_program', '"Hello, World!" program', "Traditional beginners' computer program")
Parameters:
  • * – Use to reject positional arguments.

  • dirname – The directory name.

  • variance – The default variance of the knowledge components in PEEKDataset.

  • kc_init_func – A function that creates a knowledge component. This can be customized to work with different kinds of knowledge components, as long as they follow the AbstractKnowledge protocol. The default is to initialize the KnowledgeComponent instance.

  • train_limit – An optional non-negative integer specifying the maximum number of lines to read from the train file. If None, it means no limit.

  • test_limit – An optional non-negative integer specifying the maximum number of lines to read from the test file. If None, it means no limit.

  • verbose – If True and the downloaded file doesn’t exist, this function outputs some information about the downloaded file.

Returns:

A tuple of (train, test, mapping) where train and test are PEEKData and mapping is a dict mapping topic_id to (url, title, description). PEEKData is a list of tuples (learner_id, events) where learner_id is the unique id that identifies a learner and events are a list of tuples (event, label) where event is an EventModel and label is a bool indicating whether the learner engages in this event.

The returned data looks like this:

(
    [
        (leaner_id, [
            (event, label), ...
        ]),...
    ],
    [
        ...
    ],
    {
        0: (url, title, description),...  # 0 is wiki id
    }
)

Raises:

TrueLearnValueError – If the train_limit or test_limit is less than 0.