This web page will serve as the syllabus for the course. Please read it carefully. You should become familiar with these policies. To do so, you will likely need to return to the syllabus several times throughout the semester. After the start of the semester, this document may continue to be updated. Any such changes will be announced.

## Course Name and Number

• Main: STAT 432 - Basics of Statistical Learning
• Cross-list: ASRM 451 - Basics of Statistical Learning

For simplicity, the course staff will exclusively refer to the course as STAT 432.

## Location and Time

This Fall 2020 version of the course is online.

• Location: Wherever you are!
• Time: Mostly whenever you’d like!

## Course Staff

Please refer to the course staff by their given names. For example, your instructor is named David. If you refer to the staff as “Professor” or “TA,” we might refer to you as “student,” which seems odd.

Teaching Assistants are PhD students from the Department of Statistics. Course Assistants are students who previously completed STAT 432. Course Associates are students who have completed at least one semester as a Course Assistant for STAT 432.

## Course Content

### Course Description

STAT 432 provides a broad overview of machine learning, through the eyes of a statistician. As a first course in machine learning, core ideas are stressed, and specific details are de-emphasized. After completing the course, students should be able to train and evaluate statistical models. While we will not discuss an exhaustive list of methods, given the framework developed throughout the course, students should feel comfortable exploring new methods and models on their own. Previous experience with R programming is necessary for success in the course as students will be tested on their ability to use the methods discussed through the use of a statistical computing environment.

### Topics

Tentative subjects include:

• Basics: Supervised and Unsupervised Learning, Parametric vs Non-Parametric Methods, Bias-Variance Trade-Off, Cross-Validation, Model Selection and Evaluation
• Regression: Linear Regression, Trees, KNN, Penalized Regression
• Classification: Logistic Regression, Trees, KNN, LDA, QDA, Naive Bayes
• Modern Methods: Regularization (Ridge, Lasso, Elastic Net), Ensemble Learning (Bagging, Boosting, Random Forests)
• Unsupervised: PCA, K-Means Clustering, Hierarchical Clustering, Mixture Models, EM Algorithm

### Learning Objectives

After this course, students are expected to be able to …

• identify supervised (regression and classification) and unsupervised (clustering) learning problems.
• understand some fundamental theory behind statistical learning methods.
• implement learning methods using a statistical computing environment.
• formulate practical, real-world, problems as statistical learning problems.
• evaluate effectiveness of learning methods when used as a tool for data analysis.

Note: These objectives are similar to the objectives for the Society of Actuaries Exam PA: Predictive Analytics. (See details in their linked syllabus.) While STAT 432 was not specifically designed to prepare students the the SOA Exam PA, the coverage may be sufficient to sit for the exam, although some additional exam-specific study may be required.

### Textbooks

The main text for the course will be BSL. Within BSL, readings from ISL may be assigned. If BSL and ISL provide conflicting guidance, we will defer to BSL in this course. When reading ISL, you do not need to read the sections decided to R. We will follow the R conventions only from BSL.

### Prerequisites

A course which covers linear regression that uses R, such as STAT 420 or STAT 425. Basic knowledge of probability and linear algebra is also assumed. A working knowledge of the material from the following three texts would also be sufficient.

## Course Communication

We will use several forms of communication for this course. The website will be the one-stop-shop for all course information. Compass will be used to send announcements which will also be sent via email.

If you would like to communicate with the course staff, our preferred methods of communication, in order, are:

1. Office Hours
2. Piazza
3. Email

### Office Hours

For Fall 2020, all office hours will be held online via Zoom.

Zoom with Matthew Wednesday 4:00 PM - 5:00 PM
Zoom with Jonas Wednesday 5:00 PM - 7:00 PM
Zoom with David Wednesday 7:00 PM - 8:00 PM
Zoom with Jingyu Wednesday 8:00 PM - 10:00 PM
Zoom with Jasmine Thursday 4:00 PM - 5:00 PM
Zoom with David Thursday 7:00 PM - 8:00 PM
Zoom with Tianyi Thursday 8:00 PM - 10:00 PM

The office hour schedule is always subject to change. As such, the dates and times will be posted each week along with the course materials.

Office hours are by far our preferred forum for discussing individual specific questions. In office hours, our response time will be literally instant. Also, since we are both present in the same physical location (or together on Zoom), follow-up is both expected, and easy. Using textual forms of communication such as Piazza or email will have a slow response rate and a much lower communication bandwidth. In other words, please come to office hours!

If you would like to schedule a private meeting outside of regular office hours, please send an email suggesting two possible times, on two different days. (A total of four suggested times.) We have a preference for time-slots directly adjacent to current office hours. Please also indicate a brief agenda for the meeting. Requests to schedule a meeting at a time less than 24 hours in the future are unlikely to be granted.

### Piazza

This course will use Piazza for some course communications.

The course staff will attempt to check Piazza at least once a day, thus you can often expect a response within 24 hours, except for weekends. If you need a quicker response, you should consider office hours as an alternative.

The course staff would strongly prefer the use of Piazza to GroupMe or similar services not officially supported by the course. The course staff feels that a GroupMe may exclude members of the course, whereas all are welcome on Piazza.

Private posts have been disabled. Any private matters should be discussed over email where your identity is known and private. Some anonymous posting is disabled. (You may post anonymously to your classmates, but not the course staff.)

Additional Piazza policy can be found in a pinned post on Piazza.

### Email Policy

Due to the large size of this course, we follow a strict email policy. Instead of email, consider Piazza! Any quick, non-private communication should take place there.

If you’d like to email the instructor or course staff, consider the following:

• Is your question about part of an assignment? First and foremost: You should ask it in office hours. After that, consider Piazza. As a last resort, use email, but there is a good chance you will be re-directed to Piazza.

If you choose to send an email, you must adhere to the following three rules. If you do not, your email will be considered less import than other emails which follow the rules and response time will be slower.

• All email must originate from an @illinois.edu email address or appear as sent on behalf or an @illinois.edu address.
• Depending on the situation, failure to follow this rule may make a response impossible.
• Your subject line must begin with exactly the following: [STAT 432]
• While ASMR 451 is a valid cross-listed course, please use STAT 432 for all communication purposes.
• After the above, put a single space, followed by a useful but short description of your message.
## good
[STAT 432] Grade feedback question
## bad
## improper format
## non-descriptive subject
[stat432] hi
## bad
## improper format
[STAT432] Grade feedback question
## bad
## improper format
## subject too long
## information found in syllabus or website
[STAT 432]when is the first CBTF exam and what is covered on the exam?

If your email is sent between 9:00 AM Monday and 11:59 PM Thursday, and you follow the above directions, we will try our best to respond within 24 hours. Questions about an assessment sent the same day the assessment is due will likely not receive a response before the assessment is due. Plan accordingly.

### Course Staff Emails

Role Name Email
Instructor David Dalpiaz
Teaching Assistant Tianyi Qu
Course Associate Jingyu Li
Course Assistant Matthew Lezak
Course Assistant Jonas Reger
Course Assistant Soham Saha
Course Assistant Jasmine Yi

### Code Discussion

If your question is technical in nature, there are several steps you can take to insure a speedy response on Piazza or in email.

First and foremost, you should ask Google before you ask the course staff. Take the error message you obtained and search it with Google. The ability to solve problems this way is an extremely value skill, possibly one of the most important you should learn (but are not taught) during your academic career. Make a legitimate effort to solve the problem on your own. You won’t always be able to, and if you can’t, post on Piaza. (Or better yet, stop by office hours.)

If you need to ask the course staff, include the following in your Piazza post or email:

• All code that is required to re-create the error.
• Staff should be able to run your code, without any modification, and obtain the same error or output.
• The exact error message received.

Do not use screenshots of code and error messages to communicate about them. Copy paste them so that others can copy-paste them as well.

In this course, for everything expect exams, we greatly prefer over-sharing to under-sharing code. We would rather everyone learn from others “mistakes” than have everyone experience the same issues over and over again.

## Assessments

With the exception of exams, all course assignments are due at 11:59 PM, Central (Champaign) time, on the listed due date.

### PrairieLearn Quizzes

Throughout the semester, quizzes will be administered through the PrairieLearn system. (9 for undergraduates, 10 for graduate students.) These will be low-stakes, unlimited attempt quizzes. That is, there is no penalty for submitting incorrect answers, and your score can only go up, never down. These quizzes will serve as practice for exams. No quizzes will be dropped. Instead, there will be opportunity to earn buffer points with each quiz. Buffer points will allow you to obtain over 100% for a particular assignment, but your percentage on quizzes overall cannot exceed 100%.

The buffer point and late submission details can be seen in the details of each quiz on PrairieLearn. As an example, consider Quiz 01:

• Released By: Monday, August 24
• 105% Credit: Friday, August 28, 11:59 PM
• 100% Credit: Friday, September 4, 11:59 PM
• 85% Credit: Friday, September 11, 11:59 PM

To obtain the 105% credit, you must achieve a score of 100% before the “due” date for 105% credit. (The “due” dates, we will generally refer to the date to obtain 105% credit.)

#### PrairieLearn

Quizzes and exams will both use the PrairieLearn system. Use the link below to sign-up and add STAT 432.

### Exams

There will be one midterm exam proctored using the CBTF Online. Details about the exam can be found on the Exams page.

### Data Analyses

There will be four data analyses (DA) throughout the semester. Specific policies and directions can be found on the Analyses page.

Except for the exam, all deadlines are at 11:59 PM, Champaign time, on the listed day.

Quiz 01 Friday, August 28
Quiz 02 Friday, September 4
Quiz 03 Friday, September 11
Quiz 04 Friday, September 18
Quiz 05 Friday, September 25
Exam Monday, October 5
Quiz 06 Friday, October 9
Quiz 07 Friday, October 16
Quiz 08 Friday, October 23
Quiz 09 Friday, October 30
Analysis 01 Wednesday, November 11
Analysis 02 Wednesday, November 18
Analysis 03 Wednesday, December 2
Analysis 04 Wednesday, December 9

## Course Technology

### Statistical Computing

R and RStudio are required software for this course. You will need access to a computer where you have the ability to install and update this software.

• R is a freely available language and environment for statistical computing and graphics.
• RStudio is a free and open-source integrated development environment (IDE) for R.

It is your responsibility to make sure you are using the most recent version of both R and RStudio. Failure to use the most recent version of R will result in an inability to complete the quizzes.

### Learning Management

Compass will be used to distribute grades and for assignment submissions.

### Assessment Weights

Assessment Percentage
Quiz 50
Exam 25
Analysis 25

The quiz sub-score will be the average of the 9 quizzes for undergraduates. (It will be the average of 10 quizzes for graduate students.) If your quiz sub-scores is above 100 as a result of buffer points, it will be recorded as 100. Similarly, the sub-score for the analyses will be the average of the individual analyses.

A B C D
Plus 99 87 77 67
Neutral 93 83 73 63
Minus 90 80 70 60

The instructor reserves the right to lower, but not raise, grade cutoffs. However, this policy should not create an expectation that this will happen. Asking for a change in cutoffs will make any change in cutoffs less likely.

Grading in the course is not competitive. There is nothing (other than some statistical realities) that would prevent the entire class from receiving a grade of A.

All grade disputes must be discussed with the course instructor. Teaching Assistants and Course Assistants do not have authority to modify grades.

The official University of Illinois policy related to academic integrity can be found in Article 1, Part 4 of the Student Code. Section 1-402 in particular outlines behavior which is considered an infraction of academic integrity. These sections of the Student Code will be upheld in this course. Any violations will be dealt with in a swift, fair, and strict manner. In short, do not cheat, it is not worth the risk. You are more likely to get caught than you believe. If you think you may be operating in a gray area, you most likely are.

Policies about specific assessment types will be released with directions for those assessments. Two heuristics to keep in mind:

• Do not share files with other students. Do not copy-paste code from any source other than the course notes and website.
• Use spoken language to exchange ideas, not code.

Under no circumstances should course materials be provided to Course Hero, Chegg, or any similar for-profit website. The course staff will seek the harshest possible academic integrity penalty for any students who do so.

### Disability Accommodations

To obtain disability-related academic adjustments or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 217-333-4603, e-mail disability@illinois.edu or go to the DRES website.

To ensure appropriate accommodation is provided in a timely manner, please provide your Letter of Accommodation during the first week of class. Letters received after a relevant assessment has been administered will likely cause logistical issues that could result in an inability to accommodate.

### The Extended Syllabus

For some thoughts on teaching philosophy, some explanation of policies, and some general tips for success, please see The Extended Syllabus.

### Changes

The instructor reserves the right to make any changes he considers academically advisable. Such changes, if any, will be announced. Please note that it is your responsibility to keep track of the course proceedings.

Home