Data “Wrangling” is often seen as the preliminary step of any data project, however, you will find the more you work with various data sets, that you will often find yourself “wrangling” all the way through.

To get the most of the Data Wrangling section, we recommend reviewing “Getting Ready” to make sure you have GitHub, Tableau, and Python/Jupyter installed and ready to go.

You will see the following learning objectives repeated as you explore Exploratory Analysis, Transformation, Understanding, and Documentation sections (p.s. for any MET folks reviewing this course, all links to these sections are live!). You can ask questions in the unit’s Support and Collaboration Forum. You can also review your understanding at any point (or multiple times) by taking the Data Wrangling Moodle Quiz

  • LO1 – Students should be able to assess a novel dataset for errors and/or anomalies using Tableau and develop a plan to address the found issues
  • LO2 – Students should be able to create an exploratory Tableau Workbook or Jupyter Notebook for a new dataset that will help with their understanding of the data
  • LO3 – Students should demonstrate the ability to adapt a given Jupyter Notebook to transform a novel dataset
  • LO4 - Students should be able to write concise documentation of data cleaning steps (and/or next steps)
  • LO5 - Students should be able to identify appropriate tools for ‘data wrangling’ and explain the purpose of the tools

In this unit there are four sections. Each section has a list of suggested resources, a template Jupyter Notebook and/or Tableau workbook to demonstrate the skills you will be learning, and a list of tasks to complete.

Each section below will have similar formats:

  • 🛠 Tools: which tools will be featured, and might be useful to review setup for
  • 📚 Resources: available resources to learn terminology, and get started
  • ☑️ Tasks: your learning activities for practice and/or submission. Most sections will include a task to review and rewrite a template Jupyter Notebook or Tableau Workbook
  • Optional Submissions: you have the option to submit any of these for automated grading trhough GitHub Classroom, and to indicate one submission for personalized feedback

Important

  • Tasks can be submitted for auto-grading through the Moodle Course using GitHub Classroom.
  • You can choose to submit one complete Tableau Workbook or Jupyter Notebook from a section for evaluation and personal feedback. All other submitted items will be auto-graded for correctness but will not include detailed feedback.
  • The deadline for submission for feedback is: end of month.

Exploratory Analysis (select for learning activies)

When you first gain access to a new dataset, you will want often want to being by exploring the data. So far, we have introduced two tools that can be helpful with this: Jupyter and Tableau. There are benefits and drawbacks to each, and you may find yourself using both rather than picking one or the other.

Transformations (select for learning activies)

Once you have a handle on your data, and are starting to understand the changes that need to be made - you can start transforming the data.

Understanding (select for learning activies)

Understanding refers to your understanding as the analyst of both the data, and the project you are pursuing. You want to understand:

  • the data you have collected
  • the problem you are trying to solve
  • your process so far, and what you have learned
  • the client’s needs (stated and unstated)

Documentation (select for learning activies)

Documentation is really an extension of understanding, or a tool that will help your future self and others understand what has been done. In early stages of a data project, documentation can also help you keep track of important information, next steps, or new ideas.

#


Shield: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0