Numpy Cheatsheet

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

# [ Data Preprocessing with NumPy ] {CheatSheet}

Basics and Array Creation:

● Create NumPy Array: np.array([1, 2, 3])


● Array Shape: array.shape
● Array Dimensions: array.ndim
● Array Size: array.size
● Reshape Array: array.reshape((rows, cols))
● Concatenate Arrays Vertically: np.vstack((array1, array2))
● Concatenate Arrays Horizontally: np.hstack((array1, array2))
● Transpose Array: array.T

Indexing and Slicing:

● Indexing: array[0]
● Slicing: array[1:4]
● Boolean Indexing: array[array > 5]
● Fancy Indexing: array[[1, 3, 5]]

Missing Data:

● Replace NaN with Zero: np.nan_to_num(array)


● Remove NaN Values: array = array[~np.isnan(array)]

Mathematical Operations:

● Element-wise Addition: array1 + array2


● Element-wise Multiplication: array1 * array2
● Matrix Multiplication: np.dot(matrix1, matrix2)
● Element-wise Square Root: np.sqrt(array)

Statistical Operations:

● Mean: np.mean(array)
● Median: np.median(array)
● Standard Deviation: np.std(array)

By: Waleed Mousa


● Variance: np.var(array)
● Minimum Value: np.min(array)
● Maximum Value: np.max(array)

Data Cleaning:

● Remove Duplicates: np.unique(array)


● Replace Values: np.where(array == 0, 1, array)
● Clip Values: np.clip(array, min_val, max_val)

Filtering and Sorting:

● Filter by Condition: array[array > threshold]


● Sort Array: np.sort(array)
● Sort by Column/Axis: array.sort(axis=0)

Random Sampling:

● Random Permutation: np.random.permutation(array)


● Random Sampling with Replacement: np.random.choice(array, size=n,
replace=True)
● Shuffle Array: np.random.shuffle(array)

Vectorization:

● Vectorized Operations: np.vectorize(function)(array)

File I/O:

● Read CSV: np.genfromtxt('data.csv', delimiter=',')


● Write CSV: np.savetxt('output.csv', array, delimiter=',')

Linear Algebra:

● Dot Product: np.dot(array1, array2)


● Matrix Inversion: np.linalg.inv(matrix)
● Eigenvalues and Eigenvectors: eigenvalues, eigenvectors =
np.linalg.eig(matrix)

By: Waleed Mousa


Broadcasting:

● Broadcasting: array += 5

Data Transformation:

● Log Transformation: np.log(array)


● Exponential Transformation: np.exp(array)
● Box-Cox Transformation: scipy.stats.boxcox(array)

Scaling and Normalization:

● Min-Max Scaling: (array - array.min()) / (array.max() -


array.min())
● Standardization: (array - np.mean(array)) / np.std(array)
● Z-Score Transformation: scipy.stats.zscore(array)

Handling Categorical Data:

● One-Hot Encoding: np.eye(num_classes)[array]

Reshaping and Flattening:

● Flatten Array: array.flatten()


● Ravel Array: np.ravel(array)

Interpolation:

● Linear Interpolation: np.interp(x, xp, yp)

Polynomial Fitting:

● Polynomial Fitting: np.polyfit(x, y, degree)

Time Series Operations:

● Time Lag Transformation: np.roll(array, shift=n)


● Moving Average: np.convolve(array, np.ones(window)/window,
mode='valid')

By: Waleed Mousa


Image Processing:

● Image Resizing: scipy.ndimage.zoom(image, zoom=(2, 2, 1))


● Image Rotation: scipy.ndimage.rotate(image, angle=45,
reshape=False)

Handling Strings:

● String Operations on Array: np.char.add(array1, array2)

Set Operations:

● Set Union: np.union1d(array1, array2)


● Set Intersection: np.intersect1d(array1, array2)
● Set Difference: np.setdiff1d(array1, array2)

Handling Dates:

● Convert to DateTime: np.datetime64('2022-01-01')


● Date Arithmetic: np.datetime64('2022-01-01') + np.timedelta64(5,
'D')

Handling Complex Numbers:

● Create Complex Numbers: np.complex(real, imag)


● Complex Conjugate: np.conjugate(complex_array)

Handling Inf and NaN:

● Replace Inf with Max Value: array[np.isinf(array)] = np.nan


● Replace NaN with Mean: array[np.isnan(array)] = np.nanmean(array)

Distance Metrics:

● Euclidean Distance: np.linalg.norm(vector1 - vector2)


● Cosine Similarity: cosine_similarity(array1, array2)

By: Waleed Mousa


Statistical Testing:

● T-Test for Independent Samples: t_stat, p_value =


scipy.stats.ttest_ind(sample1, sample2)
● ANOVA Test: f_stat, p_value = scipy.stats.f_oneway(group1, group2,
group3)

Outlier Detection:

● Z-Score Outliers: z_scores = scipy.stats.zscore(array)

Handling Logarithmic Data:

● Log Transformation for Skewed Data: log_array =


np.log1p(skewed_array)

Handling Exponential Data:

● Exponential Transformation for Highly Skewed Data: exp_array =


np.exp(original_array)

Handling Power Law Data:

● Power Transformation: power_transformed_array, lambda_value =


scipy.stats.boxcox(array)
● Yeo-Johnson Transformation: yeo_johnson_transformed_array,
lambda_value = scipy.stats.yeojohnson(array)

Principal Component Analysis (PCA):

● PCA: pca = PCA(n_components=2); transformed_data =


pca.fit_transform(data)

Singular Value Decomposition (SVD):

● SVD: U, S, Vt = np.linalg.svd(matrix)

By: Waleed Mousa


Handling Outliers:

● Winsorizing Outliers: winsorized_array =


scipy.stats.mstats.winsorize(original_array, limits=[0.05, 0.05])

Time Window Operations:

● Rolling Window Mean: rolling_mean =


pd.Series(array).rolling(window=3).mean()

Interpolation:

● Linear Interpolation: interpolated_values = np.interp(x, xp, yp)

Handling JSON Data:

● Convert NumPy Array to JSON: json_data = json.dumps(array.tolist())


● Convert JSON to NumPy Array: numpy_array = np.array(json_data)

Handling CSV Data:

● Read CSV into NumPy Array: data = np.genfromtxt('data.csv',


delimiter=',')
● CSV File Reading with Pandas: data = pd.read_csv('data.csv').values

Handling Excel Data:

● Read Excel into NumPy Array: data = pd.read_excel('data.xlsx',


header=None).values

Handling Text Data:

● Convert Text to NumPy Array: text_array = np.array(list(text))


● Tokenization with CountVectorizer: vectorizer =
sklearn.feature_extraction.text.CountVectorizer(); tokenized_matrix
= vectorizer.fit_transform(text_data)
● TF-IDF Transformation: tfidf_transformer =
sklearn.feature_extraction.text.TfidfTransformer(); tfidf_matrix =
tfidf_transformer.fit_transform(count_matrix)

By: Waleed Mousa


Handling Time Series Data:

● Time Series Rolling Mean: rolling_mean =


pd.Series(array).rolling(window=3).mean()
● Time Series Differencing: differenced_series = np.diff(time_series,
n=1)

Handling Multidimensional Arrays:

● Reshape to 3D Array: reshaped_array =


original_array.reshape((num_samples, num_rows, num_cols))

Handling Spatial Data:

● Distance between Two Points in 2D Space: distance =


np.linalg.norm(point1 - point2)
● Calculate Haversine Distance: haversine_distance = haversine(lon1,
lat1, lon2, lat2)

Data Binning:

● Binning Numerical Data: binned_data = np.digitize(array, bins)

Handling Imbalanced Data:

● Under-sampling with Random Choice: undersampled_data =


np.concatenate([np.random.choice(data[data_label == label],
size=min_class_samples) for label in unique_labels])
● Over-sampling with Repetition: oversampled_data =
np.concatenate([data[data_label == label] for _ in
range(int(max_class_samples / min_class_samples))])
● Synthetic Over-sampling with SMOTE: oversampled_data,
oversampled_labels = SMOTE().fit_resample(data, labels)

Handling Image Data:

● Flatten 2D Image: flat_image = image.flatten()


● Reshape 1D Image to 2D: reshaped_image =
flat_image.reshape((height, width))

By: Waleed Mousa


● Convert Image to Grayscale: grayscale_image = np.dot(image[...,
:3], [0.2989, 0.5870, 0.1140])
● Resize Image: resized_image = skimage.transform.resize(image,
(new_height, new_width), mode='constant')
● Image Rotation with Scipy: rotated_image =
scipy.ndimage.rotate(image, angle=45, reshape=False)
● Image Histogram Equalization: equalized_image =
skimage.exposure.equalize_hist(image)
● Image Gaussian Blurring: blurred_image =
skimage.filters.gaussian(image, sigma=2)
● Image Edge Detection: edges = skimage.feature.canny(image, sigma=1)
● Image Segmentation with K-Means Clustering: segmented_image =
skimage.segmentation.slic(image, n_segments=100)
● Image Feature Extraction with Histogram of Oriented Gradients
(HOG): features, hog_image = skimage.feature.hog(image,
visualize=True)
● Image Cropping: cropped_image = original_image[y1:y2, x1:x2]
● Image Histogram: hist, bins = np.histogram(image.flatten(),
bins=256, range=[0,256])
● Image Thresholding: thresholded_image =
cv2.threshold(grayscale_image, threshold_value, 255,
cv2.THRESH_BINARY)[1]
● Image Morphological Operations: kernel = np.ones((5,5),np.uint8);
morph_image = cv2.morphologyEx(thresh_image, cv2.MORPH_OPEN,
kernel)
● Image Contour Detection: contours, hierarchy =
cv2.findContours(thresh_image, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
● Image Color Spaces Conversion: hsv_image = cv2.cvtColor(rgb_image,
cv2.COLOR_BGR2HSV)
● Image Filtering with OpenCV: filtered_image =
cv2.bilateralFilter(image, d=9, sigmaColor=75, sigmaSpace=75)
● Image Edge Detection with OpenCV: edges = cv2.Canny(image,
low_threshold, high_threshold)
● Image Feature Extraction with OpenCV: sift = cv2.SIFT_create();
keypoints, descriptors = sift.detectAndCompute(gray_image, None)
● Image Template Matching with OpenCV: result =
cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)
By: Waleed Mousa
● Image Superpixel Segmentation with OpenCV: segments =
cv2.ximgproc.createSuperpixelSLIC(image, algorithm=0,
region_size=10)
● Image Corner Detection with OpenCV: corners =
cv2.goodFeaturesToTrack(image, maxCorners=25, qualityLevel=0.01,
minDistance=10)
● Image Affine Transformation with OpenCV: rows, cols =
image.shape[:2]; M = cv2.getRotationMatrix2D((cols/2, rows/2),
angle, scale); rotated_image = cv2.warpAffine(image, M, (cols,
rows))
● Image Perspective Transformation with OpenCV: pts1 =
np.float32([[56,65],[368,52],[28,387],[389,390]]); pts2 =
np.float32([[0,0],[300,0],[0,300],[300,300]]); M =
cv2.getPerspectiveTransform(pts1,pts2); transformed_image =
cv2.warpPerspective(image,M,(300,300))
● Image Color Histogram with OpenCV: hist = cv2.calcHist([image],
[0, 1, 2], None, [256, 256, 256], [0, 256, 0, 256, 0, 256])
● Image Color Quantization with K-Means Clustering: image_reshaped =
image.reshape((-1, 3)); kmeans =
KMeans(n_clusters=k).fit(image_reshaped); quantized_image =
kmeans.cluster_centers_.astype(int)[kmeans.labels_].reshape(image.s
hape)

Advanced Operations with NumPy:

● Handling Sparse Data: sparse_matrix =


scipy.sparse.csr_matrix(array)
● Matrix Factorization with NMF: W, H =
sklearn.decomposition.NMF(n_components=2).fit_transform(data)
● Sparse Matrix Operations: result =
scipy.sparse.csr_matrix.dot(sparse_matrix1, sparse_matrix2)

Handling HDF5 Data:

● Read HDF5 File into NumPy Array: data =


pd.read_hdf('data.h5').values

Handling XML Data:

By: Waleed Mousa


● XML Parsing with BeautifulSoup: soup = BeautifulSoup(xml_data,
'xml'); values = [float(tag.text) for tag in
soup.find_all('value')]

Handling SQLite Data:

● Read SQLite Database into NumPy Array: connection =


sqlite3.connect('database.db'); query = 'SELECT * FROM table'; data
= pd.read_sql(query, connection).values

Handling Pickle Data:

● Read Pickle File into NumPy Array: with open('data.pkl', 'rb') as


f: data = pickle.load(f)

Handling Avro Data:

● Read Avro File into NumPy Array: import fastavro; with


open('data.avro', 'rb') as f: data = fastavro.reader(f)

Handling Parquet Data:

● Read Parquet File into NumPy Array: import pyarrow.parquet as pq;


table = pq.read_table('data.parquet'); data =
table.to_pandas().values

Handling Feather Data:

● Read Feather File into NumPy Array: import pyarrow.feather as


feather; table = feather.read_table('data.feather'); data =
table.to_pandas().values

Handling Video Data:

● Read Video Frames into NumPy Array: import cv2; video_capture =


cv2.VideoCapture('video.mp4'); success, frame =
video_capture.read(); video_array = [] while success:
video_array.append(frame); success, frame = video_capture.read()

Handling Audio Data:

By: Waleed Mousa


● Read Audio File into NumPy Array: import librosa; audio_data,
sampling_rate = librosa.load('audio.wav', sr=None)

Handling NumPy Datetime:

● NumPy Datetime Operations: date1 = np.datetime64('2022-01-01');


date2 = np.datetime64('2022-01-05'); days_difference = date2 -
date1

Handling Complex Numbers:

● Complex Numbers Operations: complex_result = complex_array1 +


complex_array2

Handling Units:

● Convert Units with Pint: import pint; ureg = pint.UnitRegistry();


quantity = 5 * ureg.meter; converted_quantity =
quantity.to(ureg.feet)

Handling Heterogeneous Data:

● Structured Arrays: structured_array = np.array([(1, 'John', 25),


(2, 'Alice', 30)], dtype=[('id', int), ('name', 'U10'), ('age',
int)])

Handling Point Cloud Data:

● PointCloud Operations with Open3D: import open3d; point_cloud =


open3d.io.read_point_cloud('point_cloud.ply'); downsampled_cloud =
point_cloud.voxel_down_sample(voxel_size=0.05)

By: Waleed Mousa

You might also like