This is the week to get your computing environment set up. 👨‍💻 You will need the latest versions of the software:

You need to learn about creating a clean and organised environment:

and how to do reproducible work:

If you haven’t used R before, you should get familiar with using a command line, and scripting to make calculations on data:

Resources: your turn

  1. What is R?
    1. Search the internet to learn about R, what it is, and write a two sentence explanation in your own words.
    2. When was the software project started?
    3. Where did it start from?
    4. What language preceded R, and inspired the birth of R?
    5. What make R such a powerful data analysis tool?
    6. What is CRAN? Where do you download the software from?
    7. What other languages are substitutes for R, and commonly used for similar purposes?
  2. What is RStudio?
    1. Search the internet to learn about RStudio IDE? Write a sentence about the RStudio software. What does IDE mean?
    2. How does R differ from RStudio? If you have an airplane, and also an airport terminal, which would be most like R, and which would be most like RStudio?
    3. On your resume/CV would it be more impressive to list R or RStudio in your computer skills set?
    4. The company producing the software, recently was declared a Public Benefit Company. What does this mean? And what are the benefits for you the user?

Now installs

Getting started

Workflow practices

Read the material at https://r4ds.had.co.nz/workflow-projects.html

Your turn

  1. Create a project this unit. You can call it what you want. It could be a generic name like “ETC3250”, or something creative like “lorikeet”.
    1. Write a sentence explaining why YOU SHOULD ALWAYS WORK IN A PROJECT FOR THIS CLASS 😄. Each time you start RStudio for this class, be sure to open this project.
    2. What file is created in your file system/directory when you create a project? How can this be used to restart RStudio?
    3. In your RStudio settings, set the default to be “Never save the workspace”. Why do this?
    4. Which of these is a “working directory”?
      • sentence (i): where R looks for files that you ask it to load
      • sentence (ii): the home folder on your computer

What is RMarkdown?

Your turn

  1. Create a new RMarkdown file, call it “lab1_solution”.
    1. What do you find the cheatsheet for Rmarkdown?
    2. Fill in your answers for all the above activities/your turns into your “Rmd” document. Knit the document to html. Where is the Rmd file and the html file located on your computer?
    3. Complete the “yaml” header so that you are listed as the author:
    title: "Lab 1 solution"
    author: ???
    date: "My answers to activities for Week 1"
      html_document: default
    1. Add a block of R code, that computes the mean of these numbers: 1.2, -0.5, 2.9, 3.2, 0.1, 2.2
    2. Equations can be included using LaTeX (https://latex-project.org/) commands like this:
    $$s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2.$$
    produce \[s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2.\] Write an equation in your document to show how to calculate a sample mean.

Writing R code

    1. Copy the code below into a chunk in your Rmarkdown document. Run the code, line by line. When you hit an error, fix the code. When you are done test that it is all working by knitting your Rmarkdown document.
    1. One of the powerful aspects of R is to build on the reproducibility. This is a principle called DRY (Don’t Repeat Yourself). If you are going to do the same analysis over and over again, compile these operations into a function that you can then apply to different data sets. Here is an example:
average <- function(x)

y1 <- c(1,2,3,4,5,6)

y2 <- c(1, 9, 4, 4, 0, 1, 15)

Write your own function to compute the mode of some vector, and confirm that it returns 4 when applied on y <- c(1, 1, 2, 4, 4, 4, 9, 4, 4, 8)

  1. What’s an R package?
    1. How do you install a package?
    2. How does the library() function relates to a package?
    3. How often do you load a package?
    4. Install and load the package ISLR
  2. Getting data
    1. Data can be found in R packages. These are not usually kept up to date but are good for practicing your analysis skills on. How many observations are in the economics data?
    data(economics, package = "ggplot2")
    # data frames are essentially a list of vectors
    1. Or in their own packages. What variables does the gapminder data have?
    1. The readr package (part of the tidyverse suite) is useful for reading data. It mimics the base R reading functions but is implemented in C so reads large files relatively quickly, and it also attempts to identify the types of variables. Try reading the candy ranking data from the web using the code below. How many missing values in this data?
    candy <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/candy-power-ranking/candy-data.csv")
  3. Read in the OECD PISA data (file student_sub.rds is available at from the course web site)
    1. Tabulate the countries (CNT)
    2. Extract the values for Australia (AUS) and Shanghai (QCN)
    3. Compute the average and standard deviation of the reading scores (PV1READ), for each country

Got a question?

It is always good to try to solve your problem yourself first. Most likely the error is a simple one, like a missing “)” or “,”. For deeper questions about packages, analyses and functions, making your Rmd into a document, or simply the error that is being generated, you can often google for an answer. Often, you will be directed to Q/A site: http://stackoverflow.com.

Stackoverflow is a great place to get answers to tougher questions about R and also data analysis. You always need to check that someone hasn’t asked it before, the answer might already be available for you. If not, make a reproducible example of your problem, following the guidelines here and ask away. Remember these people that kindly answer questions on stackoverflow have day jobs too, and do this community support as a kindness to all of us.