Datasets Needed For Assignments
Please read all of the following and then scroll down to see all assignments, materials and resources, and due dates.
How this Bootcamp Works:
This is a fully remote, asynchronous, online, and self-directed review bootcamp.
This is not a class. This Bootcamp is self-directed and will be facilitated by a team of TAs.
The goal of this Bootcamp is for YOU to practice, to review, and to improve your R and Python programming skills. Only you know what you need to get better at :). You will need to immediately use R and Python in ANLY 501. Be sure you can!
For this Bootcamp, YOU must write, comment, error-check, and debug your own code. Please do not expect or ask the TAs to do this for you. The goal of the Bootcamp is for you to improve your programming skills.
Finding solutions, creating algorithms that problem-solve, finding and fixing syntax, debugging your own code, and making sure your code both runs and solves the problem are all part of programming.
If you get stuck coding - which is NORMAL - find your errors. Getting stuck happens to all coders all the time. The key is to get yourself unstuck. You can do it! Persistence is part of coding.
Finally, coding is a process. Build your code in parts. Test the parts. Think about the methods (algorithms) and how you want them to work. No one can do this for you. Struggling and frustration are part of the coding experience. I find that coffee helps :)
The assignments below will help you to review and practice with R and Python. However, if you think you need more review or practice beyond this Bootcamp, then when you complete this Bootcamp - continue to do more. The Web is full of examples and tutorials - and free books.
Each assignment has a suggested due date. Assignments are pass or redo - there is no "fail" option.
In this bootcamp, you might see that some of the PowerPoints or Guides say things like "Week 3 or Week 5" or whatever. Do not worry about that. Just follow the Modules and Links in order with goal of completing the Bootcamp by the final due date. I use these Guides for many things and so their names are not important :)
This review bootcamp was coded by hand and by me. It is 100% for you. It is free, flexible, and has the goal of preparing you for your classes this Fall. Only YOU know where you need to improve. Only you can determine if you need more practice and if so in which areas.
At the end of the table below, you will find more links to online courses AND FREE BOOKS that can give you further practice in R and in Python. You decide if you need the practice. If you do, keep going and look into further online courses such as the ones listed. This is not required. Again, at the graduate level, YOU are the expert. If you need more practice, then practice more.
Please remember to submit your assignments via email to your assigned TA only. Even if an assignment suggests otherwise PLEASE DO NOT CC Dr. Gates
MODULE |
TOPICS |
READINGS |
ASSIGNMENTS |
DUEDATES |
|
Module 1: R May 15 - June 15 All assignments in Mod 1 are due no later than June 15 by 11:59pm ET. Lateness will be noted by TAs and reported to the Director. |
|||||
Module 1 Part 1 |
1. Installing R and RStudio and setting up the RStudio IDE and coding environment. Complete "Hello World". 2. Installing R libraries. 3. Basic coding operations in R: decisions, loops, functions (User-defined functions, parameters, return structures, and scoping) 4. Working with files (creating, reading and writing) in R. 5. Reading data into R (csv, Excel, txt): dataframes 5. Commonly used R data structures: lists, vectors, matrices, dataframes, logical, strings, and factors. 6. Basic math and stats in R: F-test, z-test, t-test, ANOVA, IQR, p values. 7. Basic Plotting and Graphing in R: Bar, Histograms, Boxplots, Scatter. |
Required: Mod 1 Part 1 Learning Guide from Dr. Gates
List of References, Resources, and Books Functions in RTutorials in R Learning R R For Beginners Probability and Statistics in R Learning R Step by Step Guide to Data Analysis (good book) NOTE: The following describes and offers a link to a HUGE collection of my R code. TOPICS IN MY REPOSITORY: LINK TO Gates Repository of R code. |
Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG Write the following program using R. Module 1 Part 1 Assignment Dataset for the Module 1 Part 1 Assignment |
Best submitted by mid June. The due date is purposefully flexible and the final Module due date for all assignments in the Module is noted above. |
|
Module 1 Part 2 |
1. Working with data and dataframes in R: - Reshaping data in R: row-wise and column-wise joining, row manipulations, column manipulations, sorting, merging, subsetting, binning. 2. The apply() Family 3. Cleaning and preparing data in R: regular expressions, updating data, removing data, dealing with missing or NA data 4. Packages in R - these will be part of various examples and will not be exhaustive. 5. Data partitioning in R: kmeans, PCA 6. Visualization in R: plot, ggplot, qplot, leaflet |
Module 1 Part 2 R PowerPoint Guide from Dr. Gates Module 1 Part 2 PowerPoint Guide R ggplot Reference:
|
Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG Write the following program using R. To get started, review all of the tutorials and links. Next, review and practice with the following: Example R Code for Processing Text DataData Corpus for Assignment 2 Then, complete the following Week 2 Assignment: Week 2 Assignment: Text Processing in RThe following is an additional OPTIONAL challenge Assignment for those who have time and want to learn more R. If you choose to submit this assignment - TITLE IT: Week 2 Assignment - Optional One. Mod 1 Part 2 OPTIONAL Assignment RDatasets for Mod 1 Part 2 OPTIONAL Assignment R *Very Important* The shape files website was moved. Rather than ask you to do a search :) I am giving you a direct link to the shape files. Place these in the same folder as you .R code. SHAPE FILESClass - here is another 2017 version of the shapefiles - either will work and you can use either. SHAPE FILES |
The Module 1 Due Date is noted above. | |
Module 2: Python3/Anaconda June 16 - July 16 All assignments in Module 2 are due no later than July 16 by 11:59pm ET |
|||||
Module 2 Part 1 |
1. Installing and setting up Python3/Anaconda IDE. "Hello World". 2. Installing Python modules using the command line and conda. 3. Basic coding operations in Python: decisions, loops, functions (parameters, return values, scoping). 4. Using files in Python (csv, txt) 5. Using pandas dataframes in Python and Operations on data in Python: managing data, and operations for pandas dataframes and series, including adding, removing, altering columns and rows, and basic descriptive analysis and summarization. 6. Commonly used Python data structures: lists, dictionaries, numpy arrays, sets. 7. Basic math and stats in Python and numpy. 8. Basic plotting in Python with matplotlib.pyplot 9. Intro to scikit-learn and methods such as clustering. |
The Mod 2 Part 1 PowerPoint Guide Gates Note: There are a number of further references and resources in the PowerPoint Guide Code that shows pandas data cleaning examples HUGE Resource for Python Tutorials from W3 Schools online. If you cannot access this (because you are overseas) that is OK. It is not required - just a resource. |
Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG Write the following program using Python3. Mod 2 Part 1 Assignment Python3 Dataset for Mod 2 Part 1 Assignment |
All Module due dates are noted. | |
Module 2 Part 2 All assignment in Module 2 are due by the date noted above. |
1. Web Scraping - HTML review - urllib - requests - beautifulsoup 2. Using APIs - JSON - GET/POST - urllib and requests 3. Twitter Mining - and regular expressions – you can find examples of this in the twitter code I am sharing - tweepy - Word Cloud visualization |
Resources and Guides Module 2 Part 2 PowerPoint Guide Gates All Twitter Code as well as Regular Expressions and Word Cloud Code AirNow API urllib and requests Python 3 Code Example with JSON Tutorial on Data Wrangling and pandas and JSON in Python |
Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG Write the following program using Python3. Mod 2 Part 2 Assignment Python |
||
Module 3: Command Line Methods and Finishing Up July 17 - Aug 1 (last day of the bootcamp) This Module is only for Data Science and Analytics students and is optional for DSPP and HIDS students. |
|||||
1. Command line methods - Windows/DOS, MAC, Linux/Unix 2. ssh/PuTTTY/Telnet 3. What is Cygwin? 4. Accessing environment variables 5. Regular expressions and grep 6. Completing/Submitting any late assignments from previous weeks. Lates MUST be submitted no later than Aug 1. |
Mod 3 PowerPoint Guide Gates Note that resources and references are contained in the PowerPoint Guide |
Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG Mod 3 Assignment |
|||
The following is a list of online courses and FREE books. Sometimes links break.If a link is broken – ignore it. Link to many books on many topics: BOOKS LINKExtra Online Courses for R https://online-learning.harvard.edu/subject/r https://www.edx.org/learn/r-programming https://www.coursera.org/learn/r-programming Extra Online Courses in Python https://www.coursera.org/specializations/python-3-programming https://www.edx.org/learn/python https://online-learning.harvard.edu/subject/python Extra Online Courses for Intro to Data Science https://www.coursera.org/browse/data-science https://www.edx.org/course/subject/data-science https://online-learning.harvard.edu/subject/data-science FREE BOOKS in R and Python: R https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf https://www.cs.upc.edu/~robert/teaching/estadistica/TheRBook.pdf https://www.cs.upc.edu/~robert/teaching/estadistica/TheRBook.pdf http://www.bagualu.net/wordpress/wp-content/uploads/2015/10/R_Cookbook.pdf https://r4ds.had.co.nz/ Python http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf https://greenteapress.com/thinkpython/thinkpython.pdf https://www.davekuhlman.org/python_book_01.pdf https://www.brianheinold.net/python/A_Practical_Introduction_to_Python_Programming_Heinold.pdf https://opensource.com/article/18/9/python-programming-book-list |