`truelearn.datasets`.load_peek_dataset#

truelearn.datasets.load_peek_dataset(*, dirname: str = '.', variance: float = 1e-09, kc_init_func: ~truelearn.datasets._peek.PEEKKnowledgeComponentGenerator = <class 'truelearn.models._knowledge.KnowledgeComponent'>, train_limit: ~typing.Optional[int] = None, test_limit: ~typing.Optional[int] = None, verbose: bool = True) → Tuple[List[Tuple[int, List[Tuple[EventModel, bool]]]], List[Tuple[int, List[Tuple[EventModel, bool]]]], Dict[int, Tuple[str, str, str]]][source]#

Download and Parse PEEKDataset.

Examples

To load the data:

>>> from truelearn.datasets import load_peek_dataset
>>> train, test, mapping = load_peek_dataset(verbose=False)
>>> len(train)
14050
>>> train[0]  
(23128, [(EventModel(...), event_time=172.0), False), ..., (EventModel(...), event_time=55932.0), False)])
>>> len(test)
5969
>>> test[0]  
(25623, [(EventModel(...), event_time=0.0), False), ..., (EventModel(...), event_time=1590.0), False)])
>>> len(mapping)
30367
>>> mapping[0]
('https://en.wikipedia.org/wiki/"Hello,_World!"_program', '"Hello, World!" program', "Traditional beginners' computer program")

Parameters:

* – Use to reject positional arguments.
dirname – The directory name.
variance – The default variance of the knowledge components in PEEKDataset.
kc_init_func – A function that creates a knowledge component. This can be customized to work with different kinds of knowledge components, as long as they follow the AbstractKnowledge protocol. The default is to initialize the KnowledgeComponent instance.
train_limit – An optional non-negative integer specifying the maximum number of lines to read from the train file. If None, it means no limit.
test_limit – An optional non-negative integer specifying the maximum number of lines to read from the test file. If None, it means no limit.
verbose – If True and the downloaded file doesn’t exist, this function outputs some information about the downloaded file.

Returns:

A tuple of (train, test, mapping) where train and test are PEEKData and mapping is a dict mapping topic_id to (url, title, description). PEEKData is a list of tuples (learner_id, events) where learner_id is the unique id that identifies a learner and events are a list of tuples (event, label) where event is an EventModel and label is a bool indicating whether the learner engages in this event.

The returned data looks like this:

(
    [
        (leaner_id, [
            (event, label), ...
        ]),...
    ],
    [
        ...
    ],
    {
        0: (url, title, description),...  # 0 is wiki id
    }
)

Raises:

TrueLearnValueError – If the train_limit or test_limit is less than 0.

truelearn.datasets.load_peek_dataset#

`truelearn.datasets`.load_peek_dataset#