Professional Documents
Culture Documents
HR Data Collection
HR Data Collection
Internal data
Employee tenure
Employee compensation
Employee training records
Performance appraisal data
Reporting structure
Details on high-value, high-potential employees
Details on any disciplinary action taken against an employee
The only challenge here is that sometimes, this data is disconnected and
so may not serve as a reliable measure. This is where the data scientist
can play a meaningful role. They can organize this scattered data and
create buckets of relevant data points, which can then be used for the
analytics tool.
External data
Data Sources
Employee surveys
Attendance records
Employee reviews
Salary and promotion history
Employee work history
Demographic data
Personality data
Recruitment process
Employee databases
For having an effective collection of data, the data being collected must
be valid, reliable and bias free. These characteristics only will make the
process more useful and hold up to the scrutiny while performing data
analysis. Three key terms that refer to accuracy in data collection are –
Reliability, Validity, and Margin of Error.
Primarily there are two types of errors such as sampling error and non-
sampling error.