Professional Documents
Culture Documents
IAT-II FDS-Answer key
IAT-II FDS-Answer key
What is Causation?
Causation refers to the relationship between cause and effect, where one event or variable is
2. responsible for the occurrence of another event or the change in the value of another variable. In other
words, if a change in one variable leads to a change in another variable, there is said to be a causal
relationship between them.
List the categories of Numpy’s basic array manipulation.
The main categories of basic array manipulation in NumPy include:
• Changing Array Shape
3. • Joining Arrays
• Splitting Arrays
• Adding/Removing Elements
• Indexing and Slicing
Write a code snippet to explain concatenation of arrays using concatenate(), hstack() and vstack().
4.
What is the essential difference between NumPy array vs. Pandas array indexing.
• Indexing Syntax
• Index Types
5.
• Handling Missing Values
• Alignment
Enumerate the attributes of numpy array.
6.
(i)shape (ii)ndim (iii)size (iv)dtype (v)itemsize (vi)nbytes
Write a code snippet for basic errorbar with a single Matplotlib function call.
7.
Sample Snippet + Output
Write the code snippet and complete the output for data = np.random.randn(1000) using plt.hist.
8. plt.hist(data, bins=30, edgecolor='black', alpha=0.7)
plt.show()
Define Kernel Density Estimation (KDE).
Kernel Density Estimation (KDE) is a non-parametric method for estimating the probability density
9. function (PDF) of a random variable. It provides a smooth, continuous estimate of the underlying
distribution of the data, which can be particularly useful for visualizing and analyzing the shape of
the distribution.
List the ways to customize matplotlib.
• Changing Plot Style
10. • Setting Plot Colors and Markers
• Adjusting Plot Size and Aspect Ratio
• Adding Labels and Titles
PART - B
Q.No. Question
Assume that an r of –.80 describes the strong negative relationship between years of heavy smoking
(X) and life expectancy (Y). Assume, furthermore, that the distributions of heavy smoking and life
expectancy each have the following means and sums of squares:
(i) Determine the least squares regression equation for predicting life expectancy from years of
11(A) heavy smoking.
(ii) Determine the standard error of estimate, Sy|x, assuming that the correlation of –.80 was based
on n = 50 pairs of observations.
(iii) Supply a rough interpretation of Sy|x.
(OR)
Explain in detail about Pivot table with suitable examples with an example program.
A pivot table is a data processing tool used in spreadsheet programs and data analysis tools to
summarize and aggregate data based on specific criteria. It allows you to reshape and transform
data, making it easier to analyze and draw insights. Pivot tables are commonly used in spreadsheet
13(B) software like Microsoft Excel and data analysis libraries like Pandas in Python.
Sample Snippet + Output
Paraphrase on Visualization with Seaborn using an example program.
Distribution Plots
• distplot (Sample Snippet + Code)
• jointplot (Sample Snippet + Code)
• pairplot (Sample Snippet + Code)
• rugplot (Sample Snippet + Code)
• kdeplot (Sample Snippet + Code)
14(A) Categorical Data Plots
• factorplot (Sample Snippet + Code)
• boxplot (Sample Snippet + Code)
• violinplot (Sample Snippet + Code)
• stripplot (Sample Snippet + Code)
• swarmplot (Sample Snippet + Code)
• barplot (Sample Snippet + Code)
• countplot (Sample Snippet + Code)
Elaborate Visualization with Error using an example program.
Visualizing data with errors is crucial for conveying the uncertainty or variability associated with
measurements. Seaborn, along with Matplotlib, provides tools for creating visually informative plots
with error bars. Let's explore an example using a bar plot with error bars:
Example: Bar Plot with Error Bars
Suppose we have data representing the average scores of students in different subjects along with their
standard deviations:
python
Copy code
import seaborn as sns
import matplotlib.pyplot as plt
# Creating a sample DataFrame
data = {
'Subject': ['Math', 'Science', 'English', 'History'],
'AverageScore': [80, 75, 85, 78],
'StdDev': [5, 3, 6, 4]
}
df = pd.DataFrame(data)
Now, let's create a bar plot with error bars to visualize the average scores and their uncertainties:
python
14(B)
Copy code
# Bar plot with error bars
sns.barplot(x='Subject', y='AverageScore', data=df, yerr='StdDev', capsize=0.2, palette='muted')
# Adding labels and title
plt.xlabel('Subject')
plt.ylabel('Average Score')
plt.title('Average Scores in Different Subjects with Error Bars')
# Display the plot
plt.show()
In this example:
sns.barplot creates a bar plot, with x as the subject names, y as the average scores, and yerr as the
standard deviations for error bars.
capsize=0.2 controls the size of the caps at the end of the error bars.
palette='muted' sets the color palette for the plot.
This visualization provides a clear representation of the average scores in different subjects, while the
error bars indicate the uncertainty associated with each average due to the standard deviations.
Viewers can quickly assess both the central tendency and the variability of the data.
Including error bars in visualizations is essential for accurately communicating the precision or
confidence intervals of measurements. Seaborn simplifies the process of creating such informative
plots, allowing for effective data communication and interpretation.
Explain 3d Plotting with example program.
The most basic three-dimensional plot is a line or scatter plot created from sets of (x, y, z) triples. In
analogy with the more common two-dimensional plots discussed ear‐ lier, we can create these using
the ax.plot3D and ax.scatter3D functions. The call signature for these is nearly identical to that of their
two-dimensional counterparts
Analogous to the contour plots we explored in “Density and Contour Plots” on page 241, mplot3d
15(A) contains tools to create three-dimensional relief plots using the same inputs. Like two-dimensional
ax.contour plots, ax.contour3D requires all the input data to be in the form of two-dimensional regular
grids, with the Z data evaluated at each point. Here we’ll show a three-dimensional contour diagram
of a three-dimensional sinusoidal function
Wireframe:
Two other types of three-dimensional plots that work on gridded data are wireframes and surface
plots. These take a grid of values and project it onto the specified threedimensional surface, and can
make the resulting three-dimensional forms quite easy to visualize.
A surface plot is like a wireframe plot, but each face of the wireframe is a filled polygon. Adding a
colormap to the filled polygons can aid perception of the topology of the surface being visualized
Surface Triangulations
For some applications, the evenly sampled grids required by the preceding routines are overly
restrictive and inconvenient. In these situations, the triangulation-based plots can be very useful.
This leaves a lot to be desired. The function that will help us in this case is ax.plot_trisurf, which creates
a surface by first finding a set of triangles formed between adjacent points. The result is certainly not
as clean as when it is plotted with a grid, but the flexibility of such a triangulation allows for some
really interesting three-dimensional plots.
A contour plot can be created with the plt.contour function. It takes three argu‐ ments: a grid of x
values, a grid of y values, and a grid of z values. The x and y values represent positions on the plot,
and the z values will be represented by the contour levels. Perhaps the most straightforward way to
prepare such data is to use the np.meshgrid function, which builds two-dimensional grids from one-
dimensional arrays:
15(B)
Notice that by default when a single color is used, negative values are represented by dashed lines,
and positive values by solid lines. Alternatively, you can color-code the lines by specifying a colormap
with the cmap argument. Here, we’ll also specify that we want more lines to be drawn—20 equally
spaced intervals within the data range
Here we chose the RdGy (short for Red-Gray) colormap, which is a good choice for centered data.
Matplotlib has a wide range of colormaps available, which you can easily browse in IPython by doing
a tab completion on the plt.cm module:
plt.cm.
Our plot is looking nicer, but the spaces between the lines may be a bit distracting. We can change this
by switching to a filled contour plot using the plt.contourf() function (notice the f at the end), which
uses largely the same syntax as
plt.contour()
Additionally, we’ll add a plt.colorbar() command, which automatically creates an additional axis with
labeled color information for the plot
The colorbar makes it clear that the black regions are “peaks,” while the red regions are “valleys.” One
potential issue with this plot is that it is a bit “splotchy.” That is, the color steps are discrete rather than
continuous, which is not always what is desired. You could remedy this by setting the number of
contours to a very high number, but this results in a rather inefficient plot: Matplotlib must render a
new polygon for each step in the level. A better way to handle this is to use the plt.imshow() function,
which interprets a two-dimensional grid of data as an image.
plt.imshow() will automatically adjust the axis aspect ratio to match the input data; you can change
this by setting, for example, plt.axis(aspect='image') to make x and y units match.
Finally, it can sometimes be useful to combine contour plots and image plots. For example, to create
the effect shown in Figure 4-34, we’ll use a partially transparent background image (with transparency
set via the alpha parameter) and over-plot contours with labels on the contours themselves (using the
plt.clabel() function)
Prepared by HOD