Projects (click on one to read more!):
Language:
Python
Chatbot
Built a Wikipedia chatbot that took in Wikipedia queries and generated a response based on information from Wikipedia
- Wrote a matching program that took in a sequence of text from the user and matched it to a form of a question
- Example: "When was Grace Hopper born?" would match to "When was ___ born?"
- Used the Wikipedia API to extract information from Wikipedia Pages
- Used regular expressions to scan through Wikipedia Pages and find the answer to the input question
- Example: if the question was "When was Grace Hopper born?", the regular expression used to find the birth date would be: "r'(?:Born\D*)(?P\d{4}-\d{2}-\d{2})'"
- Questions implemented:
- When was __ born?
- When was ___(name of company) founded?
- When was ___(name of state) added to the US?
- When did ___ die?
- What is the polar radius of ___(name of planet)?
Naive Bayes Sentiment Analysis Classifier
Built a classifier that classified movie reivews as either positive or negative & trained it
- Trained the system using movie reviews from the "rateitall" database
- Each movie review file in the database was titled with either a "pos" prefix to indicate that it was a positive movie review or a "neg" prefix to indicate that it was a negative movie review
- Used two dictionaries, one for positive movie reviews and one for negative movie reviews, with the keys being the words that were inlcuded in the reviews and the values being the frequencies of the words appearing in the reviews.
- To calculate the frequencies of individual words appearing in either a positive or negative review, I parsed the text of each review and updated the frequency associated with that word in the appropriate dictionary
- Wrote a classify method that took in a string of text as input, classified it, and then returned its classification, positive or negative
- Calculated the conditional probabilities of each word in the review given that the document was of either the positive class or negative class
- Calculated the sum of the logs of all these conditional probabilities (this helps deal with underflow)
- Multiplied the product by the prior probability of any review being of either the positive class or the negative class
- Followed these steps twice for each review- once with the conditional probability of being positive and once with the conditional probability being positive
- If the positive conditional probability was larger than the negative conditional probability, then the review was classified as positive. Otherwise, it was classified as negative.
- To deal with the possibility of a word appearing in the training data for only one class but not the other, I used add-one smoothing by adding 1 to the numerator of each probability
Movie Chatbot
Created a movie chatbot that answered questions about movies
- Wrote a matching program that took in a sequence of text from the user and matched it to a form of a question
- Used the IMDB Movie Database to search for answers to questions
- Questions implemented:
- What movies were made in ___(year)?
- What movies were made before/after ___(year)?
- What movies were made between ___(year) and ___(year)?
- Who directed ___(name of movie)?
- Who acted in ___(name of movie)?
- What movies were directed by ___(name of director)?
- In what movies did ___(name of actor) appear?
- What years did ___(name of director) release movies?