Professional Documents
Culture Documents
Fin Ijprems1680687249
Fin Ijprems1680687249
Data Visualization
Data visualization is an important skill in applied statistics and machine learning. Statistics does indeed focus on
quantitative descriptions and estimations of data. Data visualization provides an important suite of tools for gaining a
qualitative understanding. This can be helpful when exploring and getting to know a dataset and can help with
identifying patterns, corrupt data, outliers, and much more. With a little domain knowledge, data visualizations can be
used to express and demonstrate key relationships in plots and charts that are more visceral and stakeholders than
measures of association or significance. Data visualization and exploratory data analysis are whole fields themselves
and it will recommend a deeper dive into some the books mentioned at the end. Sometimes data does not make sense
until it can look at in a visual form, such as with charts and plots. Being able to quickly visualize of data samples and
others is an important skill both in applied statistics and in applied machine learning. It will discover the many types
of plots that you will need to know when visualizing data in Python and how to use them to better understand your
own data.
How to chart time series data with line plots and categorical quantities with bar charts.
How to summarize data distributions with histograms and box plots.
KNN (K-Nearest neighbour)
K-Nearest neighbor is one of the simplest Machine Learning algorithms based on Supervised Learning technique.
K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into
the category that is most similar to the available categories.
K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means
when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it
stores the dataset and at the time of classification, it performs an action on the dataset.
KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data
into a category that is much similar to the new data.
Logistic Regression
Logistic regression is a supervised learning classification algorithm used to predict the probability of a target
variable. The nature of target or dependent variable is dichotomous, which means there would be only two
possible classes.
In simple words, the dependent variable is binary in nature having data coded as either 1 (stands for success/yes)
or 0 (stands for failure/no).
Mathematically, a logistic regression model predicts P(Y=1) as a function of X. It is one of the simplest ML
algorithms that can be used for various classification problems such as spam detection, Diabetes prediction,
cancer detection etc.
Output Screenshot
5. CONCLUSION
The analytical process started from data cleaning and processing, missing value, exploratory analysis and finally
model building and evaluation. The best accuracy on public test set of higher accuracy score algorithm will be find
out. The founded one is used in the application which can help to find the Real and fake jobs.
6. REFERENCES
[1] Priya Khandagale, Akshata Utekar, Anushka Dhonde, Prof. S. S. Karve. DOI
Link: https://doi.org/10.22214/ijraset.2022.41641.
[2] Enhanced RSA Algorithm using Fake Modulus and Fake Public Key Exponent Raghunandhan K R, Ganesh
Aithal, Surendra Shetty, Rakshith N.2018.
[3] Automatic Detection of Online Recruitment Frauds: Characteristics, Methods, and a Public Dataset. Sokratis
Vidros, Constantinos Kolias, Georgios Kambourakis and Leman Akoglu, 2017.
[4] Machine Learning and Job Posting Classification: A Comparative Study Ibrahim M. Nasser and Amjad H.
Alzaanin, 2020.
[5] Fake Job Recruitment Detection Using Machine Learning Approach, Samir Bandyopadhyay, Shawni
Dutta,2020.
[6] Predicting of Job Failure in Compute Cloud Based on Online Extreme Learning Machine: A Comparative
Study Chunhong Liu, Jingjing Han, Yanlei Shang, Chuanchang Liu, Bo Cheng, and Junliang Chen, 2017.
[7] Enhanced RSA Algorithm using Fake Modulus and Fake Public Key Exponent, Raghunandhan K R, Ganesh
Aithal, Surendra Shetty, Rakshith N 2018.
[8] Automatic Detection of Online Recruitment Frauds: Characteristics, Methods, and a Public
Dataset.SokratisVidros, Constantinos Kolias, Georgios Kambouraki and Leman Akoglu 2017.