Skip to contents

Introduction

The linearDataGen() function in the youdrawitR package allows you to generate linear line and point data suitable for the drawr() function. This vignette will guide you on how to use this function and visualize the generated data.

Generating Linear Data

Let’s generate some simple linear data.

set.seed(123) # Set seed to generate consistent data

# Generate linear data
linear_data <- linearDataGen(
  y_int = 0,
  slope = 2,
  sigma = 2,
  N = 30,
  x_min = 0,
  x_max = 20,
  conf_int = F
)

The linearDataGen() function takes several arguments:

  • y_int and slope define the y-intercept and slope of the line data.

  • sigma is the standard deviation of the point data.

  • N is the number of points to generate. Default is 30.

  • x_min and x_max are the minimum and maximum x values for the line and point data. Defaults are 0 and 20.

  • conf_int is a logical argument specifying whether a 95% confidence interval should be generated for the drawr function. Default is FALSE. If TRUE, user must also specify conf_int = TRUE in the drawr() function to generate 95% confidence interval area.

The linearDataGen() function returns a list containing the point data and line data generated from the parameters. Let’s take a look at the data that was generated from the example above:

linear_data
#> $point_data
#> # A tibble: 30 × 3
#>    data           x      y
#>    <chr>      <dbl>  <dbl>
#>  1 point_data 0     -1.12 
#>  2 point_data 0.690  0.919
#>  3 point_data 1.38   5.88 
#>  4 point_data 2.07   4.28 
#>  5 point_data 2.76   5.78 
#>  6 point_data 3.45  10.3  
#>  7 point_data 4.14   9.20 
#>  8 point_data 4.83   7.13 
#>  9 point_data 5.52   9.66 
#> 10 point_data 6.21  11.5  
#> # ℹ 20 more rows
#> 
#> $line_data
#> # A tibble: 81 × 5
#>    data          x     y  coef   int
#>    <chr>     <dbl> <dbl> <dbl> <dbl>
#>  1 line_data  0    0.619  1.93 0.619
#>  2 line_data  0.25 1.10   1.93 0.619
#>  3 line_data  0.5  1.58   1.93 0.619
#>  4 line_data  0.75 2.07   1.93 0.619
#>  5 line_data  1    2.55   1.93 0.619
#>  6 line_data  1.25 3.03   1.93 0.619
#>  7 line_data  1.5  3.51   1.93 0.619
#>  8 line_data  1.75 3.99   1.93 0.619
#>  9 line_data  2    4.48   1.93 0.619
#> 10 line_data  2.25 4.96   1.93 0.619
#> # ℹ 71 more rows

Plotting the Linear Data

Once the linear data is generated, you can use the drawr() function from the youdrawitR package to plot it, See example below:

drawr(data = linear_data)

In the plot above, you can try to draw the trend line. You’ll notice that the data has a clear linear relationship between x and y, which can be readily visualized with this function.

You can experiment with different settings of the linearDataGen() function’s parameters to generate different types of linear data.

Confidence Intervals

One important feature in the linearDataGen() function, and youdrawitR package in general is the ability to add lines on the interactive graphic. One useful instance of this is for the user to generate and draw 95% confidence intervals for the line of best fit. When set to conf_int = TRUE in the linearDataGen() and drawr() functions, the functions will generate lower and upper bounds for the confidence interval.

Let’s generate some linear data with a confidence interval.

set.seed(123) # Set seed to generate consistent data

# Generate linear data
linear_data_conf <- linearDataGen(
  y_int = 0,
  slope = 2,
  sigma = 8,
  N = 30,
  x_min = 0,
  x_max = 20,
  conf_int = T
)

# Plot the data with a confidence interval
drawr(
  linear_data_conf,
  conf_int = TRUE,
  title = "Visualization with Confidence Interval"
)

In the plot above, the shaded region that is displayed upon completion of the original line (shown by the yellow progress region) represent the lower and upper bounds of the 95% confidence interval for the line of best fit. The user can attempt to draw the upper and lower bounds of the confidence interval using the “New Line” button in the interactive plot. Try it for yourself in the plot above.