Skip to contents

Example dataset

There is a small example dataset included in the lwc2022 package called cog_data. The dataset simulates cognitive scores following the methodology used in the the Health and Retirement (HRS), specifically focusing on tasks like word recall, serial subtraction, and backwards counting. These cognitive tasks are the core of the Langa-Weir classification system used to assess cognitive function.

The simulated dataset contains 10 observations and follows the structure expected by the functions in the package (extract(), score(), and classify()). Below, we detail the steps taken to simulate the dataset.

Structure of the simulated data

The cog_data dataset contains 35 variable. A summary of its structure is presented below:

# Load the package
library(lwc2022)

# Load the example dataset
data(cog_data)

# Display the structure of cog_data
str(cog_data)
#> 'data.frame':    10 obs. of  35 variables:
#>  $ HHID    : int  288941 234057 224021 785284 326317 465208 748794 293626 669691 689448
#>  $ PN      : int  93 99 72 26 7 42 9 83 36 78
#>  $ SD182M1 : num  17 53 39 63 12 15 32 52 55 7
#>  $ SD182M2 : num  9 51 10 23 27 99 63 7 63 27
#>  $ SD182M3 : num  32 38 25 34 29 5 8 12 13 18
#>  $ SD182M4 : num  33 67 27 25 38 21 15 51 57 26
#>  $ SD182M5 : num  99 31 16 62 30 6 53 8 22 22
#>  $ SD182M6 : num  39 31 58 17 64 60 59 34 4 13
#>  $ SD182M7 : num  5 64 61 25 62 22 25 32 56 25
#>  $ SD182M8 : num  23 35 40 58 30 12 31 67 56 30
#>  $ SD182M9 : num  35 14 29 32 7 3 23 64 96 15
#>  $ SD182M10: num  21 37 8 61 10 60 52 54 34 10
#>  $ SD183M1 : num  22 12 20 56 17 56 64 35 40 56
#>  $ SD183M2 : num  61 30 15 24 59 23 53 7 29 15
#>  $ SD183M3 : num  23 26 38 56 32 7 27 52 5 6
#>  $ SD183M4 : num  16 24 32 21 65 11 36 54 56 99
#>  $ SD183M5 : num  19 25 39 64 26 9 7 34 58 13
#>  $ SD183M6 : num  19 66 62 57 39 4 1 40 30 30
#>  $ SD183M7 : num  62 25 16 24 64 11 58 20 40 3
#>  $ SD183M8 : num  29 36 62 54 22 59 52 98 20 11
#>  $ SD183M9 : num  67 65 8 56 21 55 2 53 13 56
#>  $ SD183M10: num  6 67 8 54 32 96 36 55 14 63
#>  $ SD142   : int  96 90 97 97 99 98 97 91 94 98
#>  $ SD143   : int  86 86 89 90 80 98 89 92 90 90
#>  $ SD144   : int  89 76 89 78 78 74 83 83 75 70
#>  $ SD145   : int  69 76 76 66 68 79 65 77 76 64
#>  $ SD146   : int  69 52 63 50 51 53 59 50 54 57
#>  $ SD124   : int  0 0 0 0 1 1 0 1 0 0
#>  $ SD129   : int  0 1 0 0 0 1 0 0 1 0
#>  $ SD237WA : num  -8 -8 -9 1 0 0 0 1 0 1
#>  $ SD237WC : int  13 17 3 18 2 5 12 13 10 6
#>  $ SD237WT : int  42 42 38 60 48 16 35 36 27 27
#>  $ SD238WA : num  -8 0 -8 -8 -8 -9 1 -8 -8 -8
#>  $ SD238WC : int  9 7 9 4 2 12 9 11 7 13
#>  $ SD238WT : int  37 43 33 19 12 34 21 17 12 30

The dataset contains variables for individual identifiers, cognition-related tasks (immediate/delayed word recall, serial subtraction, and backwards counting), and other variables necessary for scoring and classification.

Variable breakdown

  • HHID: A unique household identifier.
  • PN: A unique personal identifier.
  • SD182M01-SD182M10: Responses for the Immediate Word Recall task.
  • SD183M01-SD183M10: Responses for the Delayed Word Recall task.
  • SD142-SD146: Responses for the Serial Subtraction task, where participants are asked to subtract 7 from 100 iteratively five times.
  • SD124 and SD129: Responses for the Backwards Counting task, where participants count backwards from 20. SD124 represents the first attempt, and SD129 represents the second attempt.
  • SD237WA-SD237WT and SD238WA-SD238WT: Responses to a mouse clicking test measuring accuracy, click counts, and click time.

Generating the data

The generate_example_data() function generates a dataset of size n=10n = 10, producing a set of cognitive test variables along with unique identifiers. The output dataset is structured similarly to the cognitive assessment data collected in the HRS.

# Simulated dataset
generate_example_data <- function(n = 10) {
  data.frame(
    # Identifiers
    HHID = sample(100000:999999, n, replace = TRUE),   # Random household ID
    PN = sample(1:99, n, replace = TRUE),              # Random person number

    # THESE ARE THE VARIABLES USED IN THE LW CLASSIFICATIONS
    # Immediate word recall (10 items)
    SD182M1 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M2 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M3 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M4 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M5 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M6 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M7 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M8 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M9 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD182M10 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),

    # Delayed word recall (10 items)
    SD183M1 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M2 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M3 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M4 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M5 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M6 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M7 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M8 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M9 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
    SD183M10 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),

    # Serial subtraction (Subtracting 7 from 100 five times)
    SD142 = sample(90:100, n, replace = TRUE),  # First subtraction value
    SD143 = sample(80:99, n, replace = TRUE),   # Second subtraction
    SD144 = sample(70:89, n, replace = TRUE),   # Third subtraction
    SD145 = sample(60:79, n, replace = TRUE),   # Fourth subtraction
    SD146 = sample(50:69, n, replace = TRUE),   # Fifth subtraction

    # Backwards counting
    SD124 = sample(0:1, n, replace = TRUE),  # Success on first try (1 = success, 0 = fail)
    SD129 = sample(0:1, n, replace = TRUE),  # Success on second try (1 = success, 0 = fail)

    # RANDOM VARIABLES NOT USED IN LW CLASSIFICATIONS
    # Speed Test (Mouse clicking)
    SD237WA = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD237WC = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD237WT = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD238WA = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD238WC = sample(c(0, 1, -8, -9), n, replace = TRUE),
    SD238WT = sample(c(0, 1, -8, -9), n, replace = TRUE)
  )
}

Parameters

  • nn: The number of observations to generate (default n=10n = 10)

Output

The function returns a dataframe with nn rows and the following columns:

  • HHID: A randomly generated unique household identifier.
  • PN: A randomly generated personal number for each individual.
  • SD182M1 - SD182M10: Responses for Immediate Word Recall, where values are simulated from a set of codes representing different recall categories.
  • SD183M1 - SD183M10: Responses for Delayed Word Recall, with values similarly simulated as above.
  • SD142 - SD146: Values from a serial subtraction task, representing five rounds of subtracting 7 from 100 (with random variance for errors).
  • SD124 and SD129: Binary responses representing success (1) or failure (0) on two attempts at backwards counting.
  • SD237WA and SD238WA: Accuracy responses for a mouse clicking test. Responses are represented as success (1), failure (0), non participation due to technical reasons (-6) or refusal to participate (-8). SD237WA indicates the first attempt while SD238WA indicates the second attempt.
  • SD237WC and SD238WC: Responses representing the total number of clicks for a mouse clicking test. SD237WC indicates the first attempt while SD238WC indicates the second attempt.
  • SD237WT and SD238WT: Responses representing the total amount of time (in seconds) spent on a mouse clicking test. SD237WT indicates the time for the first attempt while SD238WC indicates the time for the second attempt.

Example

set.seed(123)

cog_data <- generate_example_data()

knitr::kable(head(cog_data), caption = "Example of generated cognition data")
Example of generated cognition data
HHID PN SD182M1 SD182M2 SD182M3 SD182M4 SD182M5 SD182M6 SD182M7 SD182M8 SD182M9 SD182M10 SD183M1 SD183M2 SD183M3 SD183M4 SD183M5 SD183M6 SD183M7 SD183M8 SD183M9 SD183M10 SD142 SD143 SD144 SD145 SD146 SD124 SD129 SD237WA SD237WC SD237WT SD238WA SD238WC SD238WT
288941 93 17 9 32 33 99 39 5 23 35 21 22 61 23 16 19 19 62 29 67 6 96 86 89 69 69 0 0 -8 0 0 -9 -8 -8
234057 99 53 51 38 67 31 31 64 35 14 37 12 30 26 24 25 66 25 36 65 67 90 86 76 76 52 0 1 -8 -9 0 -9 -9 1
224021 72 39 10 25 27 16 58 61 40 29 8 20 15 38 32 39 62 16 62 8 8 97 89 89 76 63 0 0 -9 0 1 -8 1 0
785284 26 63 23 34 25 62 17 25 58 32 61 56 24 56 21 64 57 24 54 56 54 97 90 78 66 50 0 0 1 -8 0 -9 -8 -9
326317 7 12 27 29 38 30 64 62 30 7 10 17 59 32 65 26 39 64 22 21 32 99 80 78 68 51 1 0 0 -8 1 -8 -8 1
465208 42 15 99 5 21 6 60 22 12 3 60 56 23 7 11 9 4 11 59 55 96 98 98 74 79 53 1 1 0 1 1 -8 -8 1