Chapter 3 Data Acquisition

Link to conceptual video: https://youtu.be/osxe5Za_-74

Data acquisition refers to the process of collecting, retrieving, gathering, and sourcing data that can be used to solve problems, answer questions, and test hypotheses that were identified during the Question Formulation phase of the HR Analytics Project Life Cycle. Various tools can be used for data acquisition, such as employee surveys, (performance) rating forms, surveillance and monitoring, database queries, and scraping or crawling. In some instances, the required data may already reside in an HR information system (HRIS) or enterprise resource planning (ERP) platform, and such data are often referred to as archival. From ethical, legal, and practical perspectives, my general advice is to acquire data with a purpose. That is, if we don’t have a compelling and well-thought-out rationale to collect certain data – especially data about people – then we should probably should resist collecting such data.

The Data Acquisition phase of the Human Resource Analytics Project Life Cycle (HRAPLC) involves gathering the data necessary for solving the problem or answering the question from the Question Formulation phase.
The Data Acquisition phase of the Human Resource Analytics Project Life Cycle (HRAPLC) involves gathering the data necessary for solving the problem or answering the question from the Question Formulation phase.

3.1 Employee Surveys

When it comes to acquiring data about employee attitudes, behaviors, and feelings, the employee survey is perhaps one of the most common (if not the most common) tools. If you’re unfamiliar with employee surveys, simply put, they consist of some number of items (e.g., questions) to which employees are asked to respond. Survey items can be open-ended (e.g., “Please describe our onboarding experience.”) or close-ended with fixed response options (e.g., “I am satisfied with my job.” [1 = Strongly Disagree, 5 = Strongly Agree]), and they can vary in length, ranging from shorter yet more frequent pulse surveys to longer yet less frequent annual engagement surveys. Further, surveys can be used to deploy multi-item measures of multi-faceted and nuanced concepts (i.e., constructs) such as engagement and organizational citizenship behaviors.

The quality of the data acquired using an employee survey depends largely on the quality of the survey content (e.g., quality of item writing), the appropriateness of the survey for the target population, and respondents’ motivation (or lack thereof) for taking the survey. To learn more about writing high-quality items, avoiding common pitfalls, and other design and administration considerations, I recommend checking out the Google re:Work guide for developing and administering employee surveys, as it distills many best practices into a user-friendly and efficient format. Below, I list some potential advantages and disadvantages of using employee surveys for data acquisition.

Advantages:

  • If designed well, they can be efficient and effective tools for acquiring self- or observer-report data on employee personality, attitudes, individual differences, and behaviors as well as perceptions of work, working environment, work-family interface, supervisor behavior, coworker behavior, and client behavior.
  • They tend to be relatively affordable to administer and a variety of platforms exist today to facilitate this process (e.g., Qualtrics, SurveyMonkey).
  • Employees are typically familiar with the concept of a survey and can exert more control over the information that is collected.

Disadvantages:

  • Some may argue that the date acquired may be more subjective in nature than the data acquired by other tools, as respondents may succumb to perceived social desirability pressures and/or fake or distort their responses.
  • They can be time-consuming and resource-intensive to respond to and to develop.
  • If surveyed too frequently, employees may experience “survey fatigue.”

3.2 Rating Forms

Rating forms often share some of the same characteristics as employee surveys (e.g, multiple close-ended items) but tend to be more focused on measuring work-related behaviors and job performance. Examples of common types of ratings forms include the behavioral observation scale and the behavioral-anchored rating scale. Given the breadth of the performance domain for most jobs, when targeting performance, ratings forms tend to consist of multiple items or dimensions. For example, the performance domain for the prototypical customer service representative job will involve interacting with customers but will likely also involve administrative tasks, for example, involving the documentation of customer complaints. Below, I offer some advantages and disadvantages of using ratings forms for data acquisition.

Advantages:

  • If designed well, they allow raters to produce data efficiently.
  • They can offer a standardized and consistent format that ultimately results in “cleaner” and more structured data than using no rating form at all to collect the same data.

Disadvantages:

  • Achieving sufficiently high reliability across raters can be challenging, even when they are using the same rating form.
  • Some types of ratings forms likely the behavioral-anchored rating scale can be very time-consuming and resource-intensive to develop.
  • Ratings may be influenced by office politics and idiosyncratic rater motivations (e.g., “I scored this person lower than they deserved to send a message.”)

3.3 Surveillance & Monitoring

Surveillance and monitoring offer a more discrete and less obtrusive approach to data acquisition. Examples include tracking system login information (e.g., dates, times), recording video or audio of employees, examining email correspondence, and deploying sensors and other wearable technologies (e.g., sociometric badges). Below, I offer some advantages and disadvantages of using surveillance and monitoring to acquire data.

Advantages:

  • They tend to be nonintrusive and operate “behind the scenes” which can lead to the acquisition of more realistic and authentic employee behavior.
  • Technological advances continue to expand surveillance and monitoring capabilities, such as those designed to measure geolocation, tone of voice, interactions, heart rate, sleep quantity and quality, and exposure to noxious chemicals.

Disadvantages:

  • Without clear and transparent communication regarding the use of such tools, employees may perceive a violation of trust.
  • The technologies behind many surveillance and monitoring tools can produce truly big data (e.g., high velocity, massive amounts, unstructured) which can make wrangling and managing the data challenging and time-consuming.
  • Employees may have ethical and privacy concerns about these tools, including about how they data are going to be used by the organization and how they are going to be protected.

3.4 Database Queries

When acquiring data that already reside in an information system or enterprise resource planning platform, a database query is often the tool of choice, where a database query refers to an action in which a request is made to access, acquire, restructure, and/or manipulate data housed in a database. Structured query language (SQL) is an example of a language that is commonly used to access data from a relational database. In many instances, the data retrieved from a database via a query meet the definition of archival data. Below, I offer some potential advantages and disadvantages of using database queries to acquire data.

Advantages:

  • They can be an efficient way to gather archival data already residing in an information system or enterprise resource planning platform.
  • They provide an opportunity to leverage data that an organization acquired previously.

Disadvantages:

  • Just because data reside in a database does not necessarily mean they are of high quality or are trustworthy.
  • Unless carefully documented, important characteristics and definitions regarding the data residing in a database may be challenging to locate, which means even when queried, the definition and purpose of certain fields (i.e., variables) may remain unclear.

3.5 Scraping

Scraping refers to the process of extracting data from websites, documents, and other sources of information. In many cases, we use scraping to gather data that were not originally intended to be used in the way that we plan to use them. For example, to predict changes in the stock market, an analyst might scraped tweets from Twitter over some period of time, use text analysis to code their sentiment, and then correlation tweet sentiment with market performance indicators. Scraping might also be applied to emails, internal company chat applications, and even electronic documents like applicant resumes. Below, I suggest some potential advantages and disadvantages of using scraping as a data-acquisition tool.

Advantages:

  • New scraping tools and R packages have made it easier than ever to scrape data.
  • Scraping tools can offer new insights based on previously difficult-to-reach or difficult-to-acquire text data that are rich with contextual information.

Disadvantages:

  • Scraping data that aren’t public or that weren’t intended to be used in the manner we plan to use them, can raise ethical and privacy concerns.
  • Once scraped, the data often need to be structured into a format for subsequent, which can be a labor-intensive and exhausting process in terms of effort and time.

3.6 Summary

In this chapter, we reviewed the Data Acquisition phase of the HR Analytics Project Life Cycle, which included overviews of common data-acquisition tools or techniques like employee surveys, rating forms, surveillance and monitoring, database queries, and scraping.