Scroll through and to the bottom of this Page to see all Details and Requirements


How to Submit Assignments.

Datasets Needed For Assignments

Class Outline: All requirements, readings, assignments, guides, due dates.

Please read all of the following and then scroll down to see all assignments, materials and resources, and due dates.

How this Bootcamp Works:

This is a fully remote, asynchronous, online, and self-directed review bootcamp.

This is not a class. This Bootcamp is self-directed and will be facilitated by a team of TAs.

The goal of this Bootcamp is for YOU to practice, to review, and to improve your R and Python programming skills. Only you know what you need to get better at :). You will need to immediately use R and Python in ANLY 501. Be sure you can!



Writing, Debugging, and Error-Checking Your Own Code:

For this Bootcamp, YOU must write, comment, error-check, and debug your own code. Please do not expect or ask the TAs to do this for you. The goal of the Bootcamp is for you to improve your programming skills.
Finding solutions, creating algorithms that problem-solve, finding and fixing syntax, debugging your own code, and making sure your code both runs and solves the problem are all part of programming.



If you get stuck coding - which is NORMAL - find your errors. Getting stuck happens to all coders all the time. The key is to get yourself unstuck. You can do it! Persistence is part of coding.



Finally, coding is a process. Build your code in parts. Test the parts. Think about the methods (algorithms) and how you want them to work. No one can do this for you. Struggling and frustration are part of the coding experience. I find that coffee helps :)



Assignments List: READ all the words on this page. Not carefully reading everything can result in missing details.

The assignments below will help you to review and practice with R and Python. However, if you think you need more review or practice beyond this Bootcamp, then when you complete this Bootcamp - continue to do more. The Web is full of examples and tutorials - and free books.

Each assignment has a suggested due date. Assignments are pass or redo - there is no "fail" option.

Important Notice:

In this bootcamp, you might see that some of the PowerPoints or Guides say things like "Week 3 or Week 5" or whatever. Do not worry about that. Just follow the Modules and Links in order with goal of completing the Bootcamp by the final due date. I use these Guides for many things and so their names are not important :)

Notice 2: For Data Science and Analytics Students

This review bootcamp was coded by hand and by me. It is 100% for you. It is free, flexible, and has the goal of preparing you for your classes this Fall. Only YOU know where you need to improve. Only you can determine if you need more practice and if so in which areas.

Do what needs to be done to become GOOD at R and at Python. When you start this Fall, all classes will *assume* that you know R and Python very well. Make sure this is true.

At the end of the table below, you will find more links to online courses AND FREE BOOKS that can give you further practice in R and in Python. You decide if you need the practice. If you do, keep going and look into further online courses such as the ones listed. This is not required. Again, at the graduate level, YOU are the expert. If you need more practice, then practice more.

Please remember to submit your assignments via email to your assigned TA only. Even if an assignment suggests otherwise

PLEASE DO NOT CC Dr. Gates

and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG

MODULE
TOPICS
READINGS
ASSIGNMENTS
DUEDATES
Module 1: R
May 15 - June 15

All assignments in Mod 1 are due no later than June 15 by 11:59pm ET. Lateness will be noted by TAs and reported to the Director.

Module 1 Part 1

1. Installing R and RStudio and setting up the RStudio IDE and coding environment. Complete "Hello World".
2. Installing R libraries.
3. Basic coding operations in R: decisions, loops, functions (User-defined functions, parameters, return structures, and scoping)
4. Working with files (creating, reading and writing) in R.
5. Reading data into R (csv, Excel, txt): dataframes
5. Commonly used R data structures: lists, vectors, matrices, dataframes, logical, strings, and factors.
6. Basic math and stats in R: F-test, z-test, t-test, ANOVA, IQR, p values.
7. Basic Plotting and Graphing in R: Bar, Histograms, Boxplots, Scatter.
Required: Mod 1 Part 1 Learning Guide from Dr. Gates

List of References, Resources, and Books

Functions in R
Tutorials in R
Learning R
R For Beginners
Probability and Statistics in R
Learning R Step by Step Guide to Data Analysis (good book)

NOTE: The following describes and offers a link to a HUGE collection of my R code.

TOPICS IN MY REPOSITORY:

1. R for data wrangling: cleaning, prep, pre-processing, normalization
2. Association Rule Mining with R
3. Decision Trees with R
4. Clustering in R: EM, kmeans, hclust, and distance measures such as cosine similarity.
5. Naive Bayes and prediction/classification.
6. Support Vector Machines in R and prediction.
7. k-Nearest Neighbor (kNN) and Random Forest
8. Text Mining Methods
9. Learning how to use R Markdown

LINK TO Gates Repository of R code.


LINK TO R Markdown Example for Data Mining: Chi^2, kNN, Clustering (3 types), Decision Trees, Naive Bayes, SVM


LINK TO HOW TO for R Markdown

Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG


Write the following program using R.
Module 1 Part 1 Assignment

Dataset for the Module 1 Part 1 Assignment
Best submitted by mid June. The due date is purposefully flexible and the final Module due date for all assignments in the Module is noted above.
Module 1 Part 2

1. Working with data and dataframes in R:
- Reshaping data in R: row-wise and column-wise joining, row manipulations, column manipulations, sorting, merging, subsetting, binning.
2. The apply() Family
3. Cleaning and preparing data in R: regular expressions, updating data, removing data, dealing with missing or NA data
4. Packages in R - these will be part of various examples and will not be exhaustive.
5. Data partitioning in R: kmeans, PCA
6. Visualization in R: plot, ggplot, qplot, leaflet

Module 1 Part 2 R PowerPoint Guide from Dr. Gates
Module 1 Part 2 PowerPoint Guide R

ggplot Reference:
Library ggplot and Visualization




Several References for R:
Cleaning Data in R:
More on Dataframes in R:
Some Useful Packages in R
The "apply" family of functions in R
Leaflet in R cran
Leaflet github
Leaflet Bloggers R

Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG



Write the following program using R.

To get started, review all of the tutorials and links. Next, review and practice with the following:

Example R Code for Processing Text Data

Data Corpus for Assignment 2

Then, complete the following Week 2 Assignment:

Week 2 Assignment: Text Processing in R




The following is an additional OPTIONAL challenge Assignment for those who have time and want to learn more R. If you choose to submit this assignment - TITLE IT: Week 2 Assignment - Optional One.

Mod 1 Part 2 OPTIONAL Assignment R

Datasets for Mod 1 Part 2 OPTIONAL Assignment R

*Very Important* The shape files website was moved. Rather than ask you to do a search :) I am giving you a direct link to the shape files. Place these in the same folder as you .R code.

SHAPE FILES

Class - here is another 2017 version of the shapefiles - either will work and you can use either.

SHAPE FILES
The Module 1 Due Date is noted above.
Module 2: Python3/Anaconda
June 16 - July 16

All assignments in Module 2 are due no later than July 16 by 11:59pm ET

Module 2 Part 1

1. Installing and setting up Python3/Anaconda IDE. "Hello World".
2. Installing Python modules using the command line and conda.
3. Basic coding operations in Python: decisions, loops, functions (parameters, return values, scoping).
4. Using files in Python (csv, txt)
5. Using pandas dataframes in Python and Operations on data in Python: managing data, and operations for pandas dataframes and series, including adding, removing, altering columns and rows, and basic descriptive analysis and summarization.
6. Commonly used Python data structures: lists, dictionaries, numpy arrays, sets.
7. Basic math and stats in Python and numpy.
8. Basic plotting in Python with matplotlib.pyplot
9. Intro to scikit-learn and methods such as clustering.

The Mod 2 Part 1 PowerPoint Guide Gates

Note: There are a number of further references and resources in the PowerPoint Guide


Code that shows pandas data cleaning examples

HUGE Resource for Python Tutorials from W3 Schools online.
If you cannot access this (because you are overseas) that is OK. It is not required - just a resource.

Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG


Write the following program using Python3.

Mod 2 Part 1 Assignment Python3


Dataset for Mod 2 Part 1 Assignment

All Module due dates are noted.
Module 2 Part 2

All assignment in Module 2 are due by the date noted above.


1. Web Scraping
- HTML review
- urllib
- requests
- beautifulsoup

2. Using APIs
- JSON
- GET/POST
- urllib and requests

3. Twitter Mining
- and regular expressions – you can find examples of this in the twitter code I am sharing
- tweepy
- Word Cloud visualization

Resources and Guides

Module 2 Part 2 PowerPoint Guide Gates

All Twitter Code as well as Regular Expressions and Word Cloud Code

AirNow API urllib and requests Python 3 Code Example with JSON

Tutorial on Data Wrangling and pandas and JSON in Python

Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG


Write the following program using Python3.


Mod 2 Part 2 Assignment Python

Module 3: Command Line Methods and Finishing Up
July 17 - Aug 1 (last day of the bootcamp)

This Module is only for Data Science and Analytics students and is optional for DSPP and HIDS students.


1. Command line methods - Windows/DOS, MAC, Linux/Unix
2. ssh/PuTTTY/Telnet
3. What is Cygwin?
4. Accessing environment variables
5. Regular expressions and grep

6. Completing/Submitting any late assignments from previous weeks. Lates MUST be submitted no later than Aug 1.

Mod 3 PowerPoint Guide Gates

Note that resources and references are contained in the PowerPoint Guide

Please remember to submit your assignment via email to your assigned TA only. Even if an assignment suggests otherwise please do not cc Dr. Gates and please email only your TA. This will avoid unnecessary emails, duplicates, or any confusion. So again - ONLY EMAIL YOUR TA :) Do not cc or email Dr. G or any other TAs with submissions. Submit only to your assigned TA. Thank you! DrG


Mod 3 Assignment

The following is a list of online courses and FREE books. Sometimes links break.

If a link is broken – ignore it.


Link to many books on many topics:

BOOKS LINK

Extra Online Courses for R

https://online-learning.harvard.edu/subject/r

https://www.edx.org/learn/r-programming

https://www.coursera.org/learn/r-programming

Extra Online Courses in Python

https://www.coursera.org/specializations/python-3-programming

https://www.edx.org/learn/python

https://online-learning.harvard.edu/subject/python

Extra Online Courses for Intro to Data Science

https://www.coursera.org/browse/data-science

https://www.edx.org/course/subject/data-science

https://online-learning.harvard.edu/subject/data-science

FREE BOOKS in R and Python:

R

https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf

https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf

https://www.cs.upc.edu/~robert/teaching/estadistica/TheRBook.pdf

https://www.cs.upc.edu/~robert/teaching/estadistica/TheRBook.pdf

http://www.bagualu.net/wordpress/wp-content/uploads/2015/10/R_Cookbook.pdf

https://r4ds.had.co.nz/

Python

http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf

https://greenteapress.com/thinkpython/thinkpython.pdf

https://www.davekuhlman.org/python_book_01.pdf

https://www.brianheinold.net/python/A_Practical_Introduction_to_Python_Programming_Heinold.pdf

https://opensource.com/article/18/9/python-programming-book-list