Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

“Consumer Complaint Analysis”

Unravelling Patterns CFPB Complaints with Python

Project Overview: Analysis of Consumer Complaint Data

Introduction
This project involves the analysis of a comprehensive consumer complaint dataset.
The dataset includes detailed information on complaints received from consumers
about various products and services. Each record in the dataset represents a single
consumer complaint, capturing a wide range of attributes including the nature of the
complaint, the product and sub-product involved, company responses, and more. The
dataset is structured with 94,678 records and 18 attributes, with varying degrees of
completeness for each attribute.

Dataset Description
The dataset contains the following columns:

1. Date received: The date on which the complaint was received.


2. Product: The category of the product that the complaint is about.
3. Sub-product: A more specific category under the main product category.
4. Issue: The specific issue the consumer is complaining about.
5. Sub-issue: A more detailed classification of the issue.
6. Consumer complaint narrative: The text provided by the consumer describing
their complaint.
7. Company public response: The public response from the company regarding
the complaint.
8. Company: The name of the company the complaint is against.
9. State: The U.S. state where the consumer is located.
10. ZIP code: The ZIP code of the consumer's location.
11. Tags: Tags indicating specific attributes or special cases related to the
complaint.
12. Consumer consent provided?: Indicates whether the consumer provided
consent to publish their complaint.
13. Submitted via: The channel through which the complaint was submitted.
14. Date sent to company: The date on which the complaint was sent to the
company.
15. Company response to consumer: The response from the company to the
consumer.
16. Timely response?: Indicates whether the company provided a timely
response.
17. Consumer disputes?: This field is entirely null with no values provided.
18. Complaint ID: A unique identifier for each complaint.

Overview of raw data:

Objectives

The primary objectives of this analysis are to:

1. Identify trends and patterns in consumer complaints across different


products and issues.
2. Assess the responsiveness of companies in addressing consumer
complaints.
3. Examine geographical distribution of complaints to understand regional
issues and trends.
4. Analyse the narratives of consumer complaints to identify common themes
and sentiments.
5. Evaluate the effectiveness of company responses and their impact on
consumer satisfaction.
Methodology
The analysis will involve several steps:

1. Data Cleaning and Preprocessing: Handling missing values, converting data


types, and ensuring data consistency.
2. Exploratory Data Analysis (EDA): Visualising data distributions, identifying
trends, and summarising key statistics.
3. Text Analysis: Using natural language processing (NLP) techniques to analyse
the complaint narratives.
4. Geospatial Analysis: Mapping complaints to understand regional patterns.
5. Response Analysis: Evaluating company responses and their timeliness

Overview of raw data after exploring the whole Dataset:

Column Name: Date Received


Column name: Product
Column name : Sub-Product

Column name : Issue


Column name : Sub-issues

Column name : Consumer complaint narrative


Column name Company Public response

Column name : company


Column name : State

Column Name: Zip Code


Column Name : Tags

Column Name: Consumer consent provided


Column Name: Submitted via

Column Name: Data sent to company


Column name: company response to consumers .

Column name: Timely response?


Column name: complaint ID

Visualisation of Null Values Present in the Data:


Alerts

Dataset has 4150 (4.4%) duplicate rows Duplicates

Product is highly imbalanced (53.9%) Imbalance

Company public response is highly imbalanced (87.4%) Imbalance

Consumer consent provided? is highly imbalanced (> 99.9%) Imbalance

Submitted via is highly imbalanced (> 99.9%) Imbalance

Timely response? is highly imbalanced (94.7%) Imbalance

Date received has 4732 (5.0%) missing values Missing

Product has 4739 (5.0%) missing values Missing

Sub-product has 4724 (5.0%) missing values Missing

Issue has 4717 (5.0%) missing values Missing

Sub-issue has 7244 (7.7%) missing values Missing

Consumer complaint narrative has 4752 (5.0%) missing values Missing

Company public response has 37639 (39.8%) missing values Missing

Company has 4745 (5.0%) missing values Missing

State has 5226 (5.5%) missing values Missing


ZIP code has 4763 (5.0%) missing values Missing

Tags has 86950 (91.8%) missing values Missing

Consumer consent provided? has 4736 (5.0%) missing values Missing

Submitted via has 4728 (5.0%) missing values Missing

Date sent to company has 4743 (5.0%) missing values Missing

Company response to consumer has 4749 (5.0%) missing values Missing

Timely response? has 4744 (5.0%) missing values Missing

Consumer disputed? has 94679 (100.0%) missing values Missing

Complaint ID has 4718 (5.0%) missing values Missing

Consumer disputed? is an unsupported type, check if it needs cleaning or Unsupported

further analysis
1
...
Distribution of Consumer consent provided?:
Consumer consent provided?
Consent provided 94678
Closed with explanation 1

Distribution of Submitted via:


Submitted via
Web 94678
Yes 1

Distribution of Timely response?:


Timely response?
Yes 94134
No 545
Submitted via Company response to consumer Count

1 Web Closed with explanation 47927


2 Web Closed with monetary relief 2238
3 Web Closed with non-monetary relief 40848
4 Web In progress 3400
5 Web Untimely response 265
6 Yes 8368931 1
7 Yes Closed with explanation 0
8 Yes Closed with monetary relief 0
9 Yes Closed with non-monetary relief 0

You might also like