Skip to content
Menu
Shark College
Shark College
Practical Text Mining Investigation

Practical Text Mining Investigation

April 19, 2022 by B3ln4iNmum

19/03/2022, 17:45 Practical Text Mining Investigation
https://herts.instructure.com/courses/93370/assignments/153950 1/3
Practical Text Mining Investigation
Due 27 Apr by 16:00 Points 40 Submitting a file upload File types pdf
Start Assignment
Weighting: 40%
Authorship: Individual
Number of hours you are expected to work on this assignment: 40
Target date for return of marked coursework: 25th May 2022
This Assignment assesses the following module Learning Outcomes (from Definitive Module
Document):
Successful students will typically:
2. be able to appreciate the strengths and limitations of various data mining models;
3. be able to critically evaluate, articulate and utilise a range of techniques for designing data mining
systems;
5. be able to critically evaluate different algorithms and models of data mining.
Assignment Tasks:
Download and unzip this folder of text data: 7com1018-practical-dataset.zip
(https://herts.instructure.com/courses/93370/files/4438458/download?download_frd=1) . Analyse the
documents within this folder using the WEKA toolkit and tools introduced within this module,
comparing two different forms of pre-processing: For example, you may investigate the impact of
using stemming over not using stemming, the effect of reducing the number of features over using all
the original features, the impact of term frequency over a simple word count, etc.
Complete the following tasks:
1. Describe which question you will be investigating (e.g. “is stemming beneficial to improving
performance?”, “is the reduction of features beneficial to improving performance?”, etc.) and why
you think your choice is an interesting question to investigate (give an answer relevant to data
mining).
2. Convert the text dataset into TWO different datasets in ARFF format, based on your chosen
question. Explain the conversion techniques and parameters that you have used, along with any
other pre-processing you wish to do. (Do
not include a screen shot of the attributes in WEKA –
you need to describe them.)

19/03/2022, 17:45 Practical Text Mining Investigation
https://herts.instructure.com/courses/93370/assignments/153950 2/3
3. For each dataset, produce a table and a graph of classification performance against training set
size for the following three classifiers: decision-tree (J48), Naïve Bayes, Support Vector Machine.
For the Support-Vector Machine you must determine the kernel, and its parameters using an
appropriate methodology.
4. Write a conclusion. You should at least compare the performance of the different learning
algorithms on your databases, and answer the question you posed in part (1).
Remember to explain the steps you have taken to complete each task in your report to obtain high
marks. Screenshots are typically not required, and should be used sparingly if at all.
Submission requirements:
A single PDF document containing your report, to a maximum 10 pages.
Marks awarded for:
Marks will be awarded out of 100 in the proportion:
1. Question (5 marks)
2. Conversion (40 marks)
3. Training/testing (40 marks)
4. Conclusion (15 marks)
A reminder that all work should be your own. Reports exceeding the maximum length may not be
marked beyond the 10 page limit.
Type of Feedback to be given for this assignment:
Along with the marks, each student will receive individual written feedback.
Additional information:
Regulations governing assessment offences including Plagiarism and Collusion are available from
https://www.herts.ac.uk/__data/assets/pdf_file/0007/237625/AS14-Apx3-AcademicMisconduct.pdf (https://www.herts.ac.uk/__data/assets/pdf_file/0007/237625/AS14-Apx3-
Academic-Misconduct.pdf)
(UPR AS14).
Guidance on avoiding plagiarism can be found here:
https://herts.instructure.com/courses/61421/pages/referencing-avoiding-plagiarism?
module_item_id=779436
For undergraduate modules:
a score of 40% or above represents a pass performance at honours level.

19/03/2022, 17:45 Practical Text Mining Investigation
https://herts.instructure.com/courses/93370/assignments/153950 3/3
late submission of any item of coursework for each day or part thereof (or for hard copy
submission only, working day or part thereof) for up to five days after the published deadline,
coursework relating to modules at Levels 0, 4, 5, 6 submitted late (including deferred
coursework, but with the exception of referred coursework), will have the numeric grade
reduced by 10 grade points until or unless the numeric grade reaches or is 40. Where the
numeric grade awarded for the assessment is less than 40, no lateness penalty will be
applied.
For
postgraduate modules:
a score of 50% or above represents a pass mark.
late submission of any item of coursework for each day or part thereof (or for hard copy
submission only, working day or part thereof) for up to five days after the published deadline,
coursework relating to modules at Level 7 submitted late (including deferred coursework, but
with the exception of referred coursework), will have the numeric grade reduced by 10 grade
points until or unless the numeric grade reaches or is 50. Where the numeric grade awarded
for the assessment is less than 50, no lateness penalty will be applied.

AssignmentTutorOnline

  • Assignment status: Already Solved By Our Experts
  • (USA, AUS, UK & CA PhD. Writers)
  • CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS
QUALITY: 100% ORIGINAL PAPER – NO PLAGIARISM – CUSTOM PAPER

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • AMN426 Assessment 2 Content CreationAssessment 2 Purpose
  • Mock Question
  • Software Development Fundamentals
  • Research Methods and Design
  • Career Viewpoint

Recent Comments

  • A WordPress Commenter on Hello world!

Archives

  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021

Categories

  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
©2022 Shark College | Powered by WordPress and Superb Themes!