COMP9318程序设计辅导、Java，CS，Python编程语言讲解讲解Java程序|讲解Java程序-留学生程序辅导网

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codehelp

COMP9318 Lab3
Instructions
This note book contains instructions for COMP9318-Lab3.
You are required to complete your implementation in a file submission.py provided along with this notebook.

You are not allowed to print out unnecessary stuff. We will not consider any output printed out on the screen. All results should be returned in appropriate data structures via corresponding functions.

You can submit your implementation for Lab3 via following link: https://kg.cse.unsw.edu.au/submit/ .

For each question, we have provided you with detailed instructions along with question headings. In case of any problem, you can post your query @ Ed.

You are allowed to add other functions and/or import modules (you may have to in this lab), but you are not allowed to define global variables. Only functions are allowed in submission.py.

You should not import unnecessary modules/libraries, failing to import such modules at test time will lead to errors.

We will provide immediate feedback on your submission. You can access your scores using the online submission portal on the same day.

For Final Evaluation we will be using a different dataset, so your final scores may vary.

You are allowed to submit as many times as you want before the deadline, but ONLY the latest version will be kept and marked.

Submission deadline for this assignment is 20:59:59 on 29th March, 2021 (SYDNEY TIME). We will not accept any late submissions.

Question-1: Text Classification using Multinomial Naive Bayes
You are required to implement a multinomial naive bayes classifier to predict spam SMS.

The training data is a set of SMS categoried into spam and ham.

import pandas as pd

raw_data = pd.read_csv('./asset/data.txt', sep='\t')
raw_data.head()
category text
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
In order to implement a unigram model, first we tokenize the text. We use the count corresponding to each token (word) in the SMS as its feature (i.e., bag of words). We store the features and catrgorical information for each SMS in a dictionary.

def tokenize(sms):
return sms.split(' ')

def get_freq_of_tokens(sms):
tokens = {}
for token in tokenize(sms):
if token not in tokens:
tokens[token] = 1
else:
tokens[token] += 1
return tokens

training_data = []
for index in range(len(raw_data)):
training_data.append((get_freq_of_tokens(raw_data.iloc[index].text), raw_data.iloc[index].category))
For this lab, you need to implement a multinomial naive bayes classifier (i.e., multinomial_nb() in the file: submission.py) with add-1 smoothing. The input arguments of multinomial_nb() are:

training_data: pre-processed data stored as a dictionary
sms: test-sms (i.e., a list of tokens) that you need to categorize as spam and/or ham
The return value of multinomial_nb() should be the ratio of the probability of sms is spam and the probability of sms is ham. A return value larger than 1 implies the sms is spam and vice versa.

For example, a sample output is shown in the cell given below:

## How we test your implementation...
import submission_ans as submission

sms = 'I am not spam'
print(submission.multinomial_nb(training_data, tokenize(sms)))
0.2342767295597484
Test Environment
For testing, we have pre-installed the requisite modules and/or libraries in the testing environment. You are only allowed to use following libraries:

python: 3.6.5
pandas: 0.19.2
NOTE: You are required to implement the classifier by yourself. You are not allowed to import sklearn and/or any other library in Lab3.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：讲解FV2204编程、辅导Python，Java程序语言、c++编程辅导

【下一篇】：讲解FV2204编程、辅导Python，Java程序语言、c++编程辅导

联系方式

最新辅导

热门辅导

您当前位置：首页 >> C/C++程序C/C++程序

COMP9318程序设计辅导、Java，CS，Python编程语言讲解讲解Java程序|讲解Java程序

日期：2021-03-24 08:17

相关文章

联系方式

最新辅导

热门辅导

您当前位置：首页 >> C/C++程序C/C++程序

COMP9318程序设计辅导、Java，CS，Python编程语言讲解 讲解Java程序|讲解Java程序

日期：2021-03-24 08:17

相关文章

COMP9318程序设计辅导、Java，CS，Python编程语言讲解讲解Java程序|讲解Java程序