Professional Documents
Culture Documents
Script BA
Script BA
"Hi everyone! I'm Minh Khiem, a dataset about students in Group 2 of the
class BA 04. An inexperienced student just collected me, and, well, let's just
say I have a few... issues. I have to find someone to help me!"
Khiem:
"Absolutely, Thu Hoang! Can you help me understand these dimensions
better? I want to be as reliable and accurate as you are."
After listening to Thu Hoang's advice, Minh Khiem tried to change and
improve himself, through which he glowed up and became a
high-quality dataset.
Khiem:
Now, I can understand what is needed to be a quality dataset, I’m going to
share it to everyone! For a bottle of wine, good quality means it meets the
requirements of flavor, color, and scent. For a dataset like me, it means
meeting seven key dimensions: Accuracy, Completeness, Consistency,
Validity, Timeliness, Uniqueness, and Integrity.
Let's start with accuracy. Accuracy refers to how close the data values are to
their real-world values. For example, when you go to a restaurant, you
expect the food to look exactly like the pictures on the menu, right? If not,
you'd feel tricked and disappointed. The same goes for data - accurate
dataset makes informed and up-to-date decisions
Next up, completeness! This refers to how much the required data is
present. A complete dataset provides a comprehensive view, helps analysts
to have a full picture of the data gathered, and good information
How about consistency? This dimension is all about uniformity across
multiple data sources. Consistent data has the same formatting and values
represented the same way everywhere. Imagine if your university showed
your mid-term scores as numbers, but your final overall score as a long
paragraph with no numbers. You'd be so confused! Consistent data avoids
something like that.
Let's continue with validity. Validity refers to how well the format and
structure of the data follow the standards of the system and the context in
which it is used. For example, a valid email address must contain an "@"
symbol and a period (".") to be formatted correctly. This element is vital, as
valid data will fit business rules and expected formats. This means you
don't have to spend additional time configuring the data to be able to use it,
saving valuable resources.
Next, we have integrity. Integrity is all about the consistency of data validity
across several datasets. If integrity is maintained throughout all data
entities, analysts can confidently make decisions based on credible,
trustworthy data. Imagine trying to analyze data from multiple sources,
only to find inconsistencies and contradictions – it would be a nightmare!