STAT 432 is a difficult course. Although, as a result of the historical grade distribution I believe some students enter the course believing that it is an “easy” course. This document was written to help address the reasons for this difference between perception and reality. The style is stolen from the popular “Ten Rules” articles published in PLOS journals. A relevant example is Ten Simple Rules for Effective Statistical Practice.
Perhaps it is odd to begin a list of rules exclaiming that there are no rules.
Many students that enroll in STAT 432 have an extensive mathematics background where everything follows a set of rules. However, statistics is not mathematics. While there are certainly rules in statistics, in applied statistics, an analyst must make decisions that can only be guided by heuristics. Students often ask questions such as “What method should I use in this situation?” hoping for a specific answer. While it would be easy to simply provide an “answer” of some specific model or procedure, the reality is that the answer will almost always be “it depends” and the analyst will have to make a somewhat subjective decision based on an extremely long set of heuristics. (The other answer to “Which method should I use in this situation?” will be "Who knows? Try a few and evaluate which is working best. Evaluating methods will be a big focus in the course. We care more about the ability to evaluate methods than understanding the inner workings of each method.)
In other words, while we could simply write some sort of flow-chart that tells you what to do in any situation encountered in the course, we reject this authoritarian approach. We prefer to present some heuristics, some reasoning behind them, and allow you to think for yourself. Skepticism is encouraged. You are allowed to form your own opinions about the course material.
Applied to the data analyses done in STAT 432: There is no single correct answer. There are only good arguments and bad arguments.
Please. A familiarity with the syllabus will make your experience in the course much smoother. I would suggest returning to the syllabus a number of times throughout the semester, perhaps shortly before the exams.
A trap many students fall into is believing that everything they have previously learned is relevant in future courses. This is not the case. Just because a method was taught in STAT 420 or STAT 425 does not mean that it is relevant in STAT 432. The most common example of this is Variance Inflation Factors. Students seem to love to drag this concept into STAT 432. While it is certainly possible to appeal to VIFs in STAT 432, they seem to be misapplied more often than not. This is because VIFs are more relevant for inference than prediction. STAT 432 cares about prediction much more than STAT 420 and STAT 425.
Rule 4 is somewhat related to Rule 3. The full rule would read: “All statements are true, given the correct conditions.” Rule 3 is relevant here because students will often search for information on the internet. They’ll arrive at some prescription such as “Method X is good at Task Y.” In reality, this statement is always more correctly stated as “Method X is good at Task Y under Condition Z.”
In other words, context is extremely important.
STAT 432 covers a lot of content, sometimes at a surface level. When only scratching the surface, students find the lack of details unsatisfying. This is understandable, but realize that STAT 432 is a first course in machine learning. We don’t believe it is possible to learn all of machine learning in a single course. STAT 432 is about having an understanding of what machine learning is and what you can do with it. Our goal is not to teach you every single detail. (That is impossible.) Instead, we would like to provide a high level overview that will serve as a foundation for future learning. (Both self-study and future courses.)
You will struggle, and that is a good thing. If everything in the course were “easy,” very little learning would take place. However, we are not advocating struggle for the sake of struggle. We want to support your “struggle” with the material. The course staff is not the enemy. The material is the “enemy.” We are here to help you. Do not hesitate to ask the course staff questions! Come to office hours! Post on Piazza!
Please keep the KISS Principle in mind. (The name is somewhat unfortunate. No, we are not calling you stupid.) Complexity does not imply valuable.
Within the context of STAT 432:
Warning, the following link contains foul language: RTFM. RTFM is a common phrase in coding culture. While extremely insensitive, it is perhaps some of the most relevant advice for STAT 432.
In short, if you experience a coding issue:
Rinvolves running a function.)
This should always be your first line of defense any time you encounter an issue. However, we do not expect you to be able to solve all your problems with this method! That’s why office hours exists! We simply would like you to get into this habit. Having gone through this step, you are more likely to solve the problem yourself. Additionally, you will be better prepared to discuss any issue with the course staff if you are unable to solve it yourself.
It sounds cliche, but it is true. Do not hesitate to ask the course staff questions! Come to office hours! Post on Piazza!
Do note that while there really are no stupid questions, there are some annoying questions. For example:
The second bullet requires some explaining. The direct answer to that question is technically “Yes.” However, please note that the course staff is not a debugging machine. If you simply supply us with a bunch of code and ask us what is wrong, you’ll be met with a bit of frustration. We expect that you at least pinpoint where there is an issue with the code, within reason. (We will give some advice on how best to do this as the semester progresses.) In other words, try to ask your code question in a way that demonstrates that you have already thought about solving your issue. (See Rule #8.) We will always work with you to resolve your issue, but we ask that we are not your first attempt at a resolution.
Corollary: There are no “quick” questions. Student often like to preface a question with the phrase “Just one quick question.” If you’re asking the question, how do you know how long it will take to answer? (I suppose if you know it’s in the syllabus, then the answer will be quick, but …) More often than not students ask excellent questions but then expect a short, succinct answer. When you ask a good question, this is often not possible. The point is, please come to office hours where we can have an in depth conversation, that is not time limited, with additional input from you!
Students overvalue lecture and undervalue homework. STAT 432 will probably contain less “lecture” than you expect, and far too much “work” in the form of quizzes and analyses.
Watching a lecture is a passive activity. Taking quizzes is an active activity. Reading is a passive activity. Performing an analysis is an active activity. In my opinion, students enjoy (or more specifically don’t dislike) lecture because it requires zero input from them. On the other hand, quizzes are frustrating, but that is a good thing! That frustration means that there is something to learn!
Stated practically and with relevance to STAT 432:
In summary, if you bring an open mind and a bit of effort, we believe that you will succeed in STAT 432. We don’t believe that the course is easy, but we hope that it is ultimately rewarding.