Chapter 69 Creating a Data Analytics Portfolio
When searching for a job in data analytics, it can be tricky to (a) demonstrate what you know how to do or have done previously and (b) distinguish yourself from other job candidates. Creating a portfolio of your work can be one way to set yourself apart on the job market and can, in some instances, help compensate for fewer years of relevant work experience. In this document, I will provide some suggestions on how to set up a portfolio, with an emphasis on applying the R programming language.
Identify a content area or context that excites you.
In human resource (HR) analytics, for example, there are variety of different content areas and contexts. When identifying a content area or context that exists you, it’s important to be as specific as you can. For example, are you interested in employee sensing (e.g., engagement surveys), separation and retention, recruitment and selection, training and development, reward systems, or workforce planning? In your portfolio, you can present multiple projects that highlight different content areas and contexts, but I recommend focusing your first portfolio project on an area (a) that you are really passionate about and (b) in which you have some prior work experience (even if not in the application of data analytics/science to that area).
Identify an area of data analytics that excites you.
Begin by reflecting on what type of work you intrinsically enjoy at the moment and what work you would like to continue doing in the near future. For example, you might enjoy one of the following more than others: question formulation (e.g., problem definition), data acquisition (e.g., measure development), data management (e.g., data wrangling), data analysis (e.g., model building), or storytelling (e.g., data visualization or dashboard creation). Ideally, you will want to create a portfolio that highlights your strengths in each of the areas you wish to emphasize. With that being said, it’s also good to show at least some level of proficiency in non-emphasized areas as well.
Figure out what knowledge, skills, abilities your ideal employer wants – or what it needs but doesn’t yet realize it needs.
At the very least, you’ll want to figure out what knowledge, skills, or abilities the employer expects to see in ideal job candidates. You might be able to glean some of these expectations from the job posting itself or from the organization’s website. In addition, you find examples of expected knowledge, skills, and abilities in published white papers, peer-reviewed publications, or the LinkedIn profiles of other individuals who hold similar job at that organization. Even better, try to attain direct insider information from those who currently work at that organization. A portfolio also provides you with an opportunity to showcase tools, applications, knowledge, skills, or abilities that the organization likely needs to attain strategic objectives but does not yet realize or recognize. This might be especially relevant in situations in which you suspect that the organization’s data analytics capabilities are less mature or when you believe you might be overqualified.
Decide whether you want your portfolio to teach, showcase, or do both.
Both teaching and showcasing can be useful ways to illustrate your knowledge, skills, and abilities. If you go the teaching route, your portfolio project will likely take the form of a tutorial. A well thought out tutorial can be a good method for showing that you understand the concepts and technical applications well enough to teach another person how to do the same. If you go the showcasing route, your portfolio project will focus less on teaching and more on highlighting what you are capable of doing. If your portfolio consists of multiple projects, you might find it worthwhile to include at least one teaching project and at least one showcasing project.
Find an appropriate dataset.
There are many different places in which you can find toy datasets or public datasets that are free to use. Though, you’ll want to be absolutely certain that the dataset you’ve chosen is free to use and publicly available. That is, you do not want to use proprietary or private data for your portfolio. Examples of repositories for data sets include:
- My GitHub repository called R Tutorial Data Files, which are all datasets that I’ve simulated using R;
- This GitHub repository called Awesome Public Datasets;
- Kaggle;
- This Stanford University website has links to a variety of public datasets.
If you can’t find an appropriate dataset, you can always simulate one using R, which is more involved and complicated. Rich Landers’ (University of Minnesota) website includes this tool, which can be used to simulate simple datasets.
Create an immersive environment for the intended audience.
It can be tempting to just manage, analyze, and visualize a bunch of data without providing any context or backstory. Because context matters, I recommend creating an immersive environment that orients the intended audience to the context, including variable definitions. This can be dibe via writing, audio, or video. By establishing an immersive environment, the problem you’re attempting to solve or the question you’re attempting to answer will be more meaningful – and ideally will illustrate a clear purpose (e.g., helping an organization to attain a strategic objective).
Articulate a clear problem (question) that can be solved (answered) with the available data.
It’s important to make sure that your portfolio project is problem- or question-focused. That is, use your portfolio to show how you can go from a problem definition (or question formulation) to a solution (or answer). In doing so, you can demonstrate your ability to conduct meaningful and purposeful data analytics. For a refresher on problem definition and question formulation, check out this chapter.
Write (and annotate) your code with an emphasis on clarity.
Your code provides a behind-the-scenes glimpse at your decision-making processes so make sure it’s clear and understandable. For positions that expect candidates to have less advanced programming skills, you can focus on writing code that is understandable to a broad audience and that illustrates you understand foundational concepts, operations, and techniques. It might not be the most efficient or elegant code, but it should be clear and free of errors. For positions that expect candidates to have more advanced programming skills, you’ll want to focus on writing code that is stable, reproducible, efficient, and elegant. Regarding efficiency and elegance, you’ll want to consider how long it takes your code to run and how this might be more consequential at scale, and ideally, you’ll want to write less code when possible. Be sure to include clear annotations that help explain your many decisions.
If your portfolio project includes data analysis or visualization, make sure that you’ve chosen an appropriate analysis or visualization given the problem/question and available data.
You can run all sorts of analyses and attain results – even when the analyses are not appropriate or meaningful given the problem/question and/or available data. If you are performing statistical analysis of the data, you’ll want to make sure that the statistical assumptions for a particular analysis have been reasonably satisfied; better yet, demonstrate in your portfolio how you tested relevant statistical assumptions. All else being equal, it’s best to choose the simplest and most easily interpretable analysis. For example, if you’re interested in comparing the means for two independent samples, then there are statistical equivalent ways to analyze the data: independent-samples t-test, one-way analysis of variance, simple linear regression, and structural equation modeling. In this example, the independent-samples t-test will likely be the simplest analysis to run and communicate given the goal of comparing the means for two independent samples. I like to think of it using this metaphor: If your objective is to get some almond milk from your corner grocery story, you could walk (independent-samples t-test) or you could drive a Ferrari (structural equation modeling); both will get you there, but one is less resource intensive.
Focus on good storytelling.
Make sure your portfolio project tells an accurate yet compelling story. While writing sophisticated code or running advanced analyses may impressive some, at the end of the day, your portfolio project should tell a good (and hopefully memorable) story. For a review of classic storytelling principles, check out this chapter.
Solicit friendly feedback prior to sharing your portfolio with an employer.
Everyone makes errors, and it’s better to have a friend or colleague catch those errors prior to sharing the portfolio with an employer. Friends or colleagues can also provide feedback on how intuitive, comprehensible, or appropriate your portfolio is. So who should you ask for feedback? Ideally, you should seek feedback from individuals who have greater expertise in the area than you to make sure you’ve done everything correctly or appropriately, but it can also be helpful to seek feedback from people who you expect will have a similar level of expertise as the intended audience, as this latter group may help you create a portfolio that is not overly complex for the given audience.
Update your portfolio regularly.
You’ll want your portfolio to feel fresh and contemporary, which means that you’ll need to update your portfolio with some regularity (e.g., once or twice a year). In addition, be sure to check periodically in which some of the packages/functions you’re using have been updated, which could affect how your code works. One way to work around this is to use a dependency management package like packrat. Although using a dependency management package will help to ensure that your code works properly over time, it doesn’t guard against stale-looking code that references deprecated functions.