Professional Documents
Culture Documents
Data Analyst Assessment
Data Analyst Assessment
1- Data Quality:
- Can you spot data quality issues in the dataset? If yes summarize them
Yes, in the gander of customer should be unified into two types e.g (M= Male, F= Female) I used
abbreviation for Male and Female as one litter M-F respectively for easier and faster data entry.
(Incompleteness) = missing phone numbers, national IDs and DaysUntilDelivery.
(Inaccuracy) = we have a junk value in national IDs ex. 1.08815E+12, and phone number ex.
###############.
And we have typo error in regions ex. Tabook, Riyad, Medina, Mecca, Baha.
(Non-standard) = the phone numbers (12) and national IDs (10) should have the same length for
each.
- What can be done to fix these issues? please apply on the sheet, otherwise the document
- What do you think are possible ways, methods, or technique to ensure data quality is
1. Data Governance: Establish policies and roles for data quality management.
4. Data Cleansing: Correct errors and standardize data using automated tools.
7. Data Quality Audits: Regularly audit data quality, using both automated and manual checks.
8. Data Training and Awareness: Train staff on data quality importance and best practices.
9. Master Data Management (MDM): Maintain a single, accurate version of master data.
10. Automated Data Quality Tools: Use specialized tools for automated checks and monitoring.
11. Data Integration Best Practices: Follow best practices for consistent data integration.
12. User Input and Feedback: Encourage user reporting of data quality issues for continuous
improvement.
To ensure data quality across an organization, key methods include implementing a robust data
governance framework, conducting data profiling, enforcing validation rules, and employing
automated tools for data cleansing. Maintaining comprehensive documentation, conducting
regular audits, and monitoring key performance indicators are crucial. Fostering a data-centric
culture through training, implementing master data management, and utilizing advanced tools
also play essential roles in ensuring data quality.
The missing data is identity data like phone numbers and national IDs can’t be replace by significance
type of way of guessing which is using Mean, Median and Mode.
Otherwise we can use the Mean, Median and Mode for the other missing data.
2- Descriptive Statistics:
- Calculate the Mean, Median and Mode of monthly last mile shipments, and explain what
Mean (Average):
Definition: The mean, or average, is a measure of central tendency that represents the sum of a set of
values divided by the total number of values in the dataset.
Median:
Definition: The median is the middle value of a dataset when it is ordered in ascending or descending
numerical order. If there is an even number of values, the median is the average of the two middle
values.
Mode:
Definition: The mode is the value that appears most frequently in a dataset. A dataset may have no
mode, one mode (unimodal), or multiple modes (multimodal) if two or more values have the same
highest frequency.
Note: A dataset with one mode is called unimodal, while a dataset with two or more modes is
multimodal.
- What months are below the mean and what months are above the mean and tell us what
Months
months are below the mean Feb, May, Oct, Nov, Dec
( without missing data)
months are above the mean Jan, Mar, Apr, Jun, Jul, Sep,
( without missing data)
months are below the mean Feb, Mar, May, Aug, Oct, Nov, Dec
(with)
months are above the mean Jan, Mar, Apr, Jun, Jul, Sep,
(with)
- What are the highest deliveries months and what are the lowest?
The highest deliveries months is Jan, Jul, Sep (without missing data) and the lowest is Feb, May, Oct.
The highest deliveries months is Apr, Jun, Jul and the lowest is Feb, may, Oct.( with missing data).
- What are the proportions between Male and Female in terms of gender in the customer
population.
Female = 200
Male = 300
The male is order request more than the female at the rate of 1.5%.
- Can you define the interquartile range and calculate it for days until delivery?
- Can you specify outlier deliveries in days until delivery and what method you will use?
- In terms of regions, status and services kindly provide an analysis into that relation and