Professional Documents
Culture Documents
(Assignment 1 & 2) Regular Expression
(Assignment 1 & 2) Regular Expression
1. Given a list of URLs, write a regular expression to extract the domain name from
each URL. Provide examples and explain the logic behind your regex.
2. You are given a text document containing various dates in different formats (e.g.,
MM/DD/YYYY, DD-MM-YYYY, YYYY/MM/DD). Write a Python script that uses
regular expressions to extract and format all the dates in a consistent manner.
3. Suppose you have a large dataset of product descriptions. Write a regular expression
to find and extract all the prices mentioned in the descriptions, considering different
currency formats (e.g., $100, €50, ¥5000). Explain how your regex works.
4. Given a block of HTML code, write a regular expression to extract all the hyperlinks
(URLs) contained within the HTML <a> tags. Explain the steps and groups in your
regex pattern.
5. Implement a regular expression that can detect and correct common spelling
mistakes in a text document. Provide examples and explain the substitution logic
used in your regex.
6. Design a regular expression to extract street addresses from a text document,
considering variations in address formats (e.g., 123 Main St, Apt 4B vs. 456 Elm
Avenue). Discuss the challenges and strategies for handling different address
structures.
7. Create a regular expression that identifies and extracts hexadecimal color codes
(e.g., #FFAABB) from a CSS stylesheet. Explain the pattern you use to capture these
codes.
1. Develop an algorithm that can resolve context related ambiguities when extracting
patterns e.g., disambiguating between Apple as fruit and Apple as a company.
2. Sentiment analysis and sentiment visualization using non-negative matrix
factorization.
3. Evaluate the performance of your above algorithm in terms of accuracy, precision
and recall.
Submission Guidelines:
➢ Submit a well structured code with detailed explanation and comments for each code
section.
➢ Adhere to the submission guidelines.
Note: For all above problems, make your own libraries rather than using builtin libraries.