Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

Assignment 1: Regular Expressions

1. Given a list of URLs, write a regular expression to extract the domain name from
each URL. Provide examples and explain the logic behind your regex.
2. You are given a text document containing various dates in different formats (e.g.,
MM/DD/YYYY, DD-MM-YYYY, YYYY/MM/DD). Write a Python script that uses
regular expressions to extract and format all the dates in a consistent manner.
3. Suppose you have a large dataset of product descriptions. Write a regular expression
to find and extract all the prices mentioned in the descriptions, considering different
currency formats (e.g., $100, €50, ¥5000). Explain how your regex works.
4. Given a block of HTML code, write a regular expression to extract all the hyperlinks
(URLs) contained within the HTML <a> tags. Explain the steps and groups in your
regex pattern.
5. Implement a regular expression that can detect and correct common spelling
mistakes in a text document. Provide examples and explain the substitution logic
used in your regex.
6. Design a regular expression to extract street addresses from a text document,
considering variations in address formats (e.g., 123 Main St, Apt 4B vs. 456 Elm
Avenue). Discuss the challenges and strategies for handling different address
structures.
7. Create a regular expression that identifies and extracts hexadecimal color codes
(e.g., #FFAABB) from a CSS stylesheet. Explain the pattern you use to capture these
codes.

Assignment 2: Regular Expressions

1. Develop an algorithm that can resolve context related ambiguities when extracting
patterns e.g., disambiguating between Apple as fruit and Apple as a company.
2. Sentiment analysis and sentiment visualization using non-negative matrix
factorization.
3. Evaluate the performance of your above algorithm in terms of accuracy, precision
and recall.

Submission Guidelines:

➢ Submit a well structured code with detailed explanation and comments for each code
section.
➢ Adhere to the submission guidelines.

Note: For all above problems, make your own libraries rather than using builtin libraries.

You might also like