Appendix B — Group project assignment
To maximize how much you learn and how much you will retain, you as a group will take what you learn in the course and apply it to create a reproducible project within a server environment. This project …
We will have created a project folder to work in on the server as well as assigning you and your team mate(s) to the project. Within your project, you will also be given a specific set of outputs to create (a figure, a table, and a basic report), based on the data we provide to you. The outputs will not require using all of the data given.
During the last session of the course you will work on this assignment. In the last ~20 minutes of this session, the lead instructor will go into your projects and re-generate your report to check that it is reproducible.
B.1 Specific tasks
You will be collaborating as a team using … to manage your group assignment. We will set up the project with … for you so you can quickly start collaborating together on the project.
Your specific tasks are:
- Review the outputs we want you to create as well as the specific data needed for creating them. Keep these in mind for later tasks.
- Identify what resources you have available in the server environment, such as the number of cores and amount of memory. Use this information to guide your coding.
- Look into the … folder that contains the raw data you will use for this project. Identify which file storage format the data is saved in and convert it into a more efficient method if necessary. Depending on which format it is, save the converted format into the … folder. Make use of parallel processing to speed up this (using
{furrr}
). - Read in a subset of the data that only contains the columns and rows you need for your outputs. Randomly keep a slice of this data to use for prototyping code later.
- Working backwards, write out a set of (empty) functions that provide the sequences of steps that will create a specific output. Begin filling in these functions, making sure they work before adding them to the
{targets}
pipeline. Write the functions and{targets}
pipeline so that it will run things in parallel. - Incorporate the outputs into a report, include that in the
{targets}
pipeline. - Comment out the line of code that randomly keeps a slice of the data and then run the
{targets}
pipeline withtargets::tar_make()
. - The project will be complete if you can regenerate all the outputs using only the
{targets}
pipeline.
B.2 Quick “checklist” for a good project
B.3 Expectations for the project
What we expect you to do for the group project:
- Use parallel processing.
- Use a
{targets}
pipeline. - Use functional programming (including creating your own functions).
- Use an efficient file storage format.
What we don’t expect:
- No complicated analyses.
- No complicated figures or tables.
- No processing that isn’t specifically related to creating the assigned output.
Essentially, the group project is a way to reinforce what you learned during the course, but in a more relaxed and collaborative setting.