Skip to content
Menu
Shark College
Shark College
Coursework

Coursework

May 2, 2022 by B3ln4iNmum

3
CN7030 – Machine Learning on Big Data
Coursework: 2021-22 Academic Year
This coursework (CRWK) must be attempted in the groups of 2 students. This coursework
is divided into two sections: (1) Spark Machine Learning on a real case study and (2) Spark
Streaming for a Streaming-based Application.
All group members
must attend the presentation at week 12. Presentation would be online
through Microsoft Teams.
If you fail to attend the presentation, your mark will be zero.
Overall mark for CRWK comes from two main activities:
1- Big Data report (around 3,000 words, with a tolerance of ± 10%) in the HTML format
(60%)
2- Presentation (40%)
Assessment for resit (2nd attempt):
We will have only Big Data report (100%) in the HTML format. Students must develop new
solutions for the same tasks. If students copy their solutions from main sit, it will be
considered as a self-plagiarism and the mark will be zero. The marking scheme is same as
main sit.
Marking Scheme

Topic Total
mark
Remarks
(breakdown of marks for each sub-task)
Machine
Learning on
Big Data
60 (20) Design one binary classifier, and explain its configurations
and parameters
(25) Design one multi classifier incorporating Ensemble
techniques, and explain its configurations and parameters
(15) Performance and accuracy measurements on both
classifiers.
Data
Streaming
Application
30 (5) Configure and initiate the Streaming environment in Spark.
(25) Manipulate and process the real-time data and visualize
them at (near) real-time
Documentation 10 (10) Write down a well-organized report for a programming and
analytics project.
Total: 100

AssignmentTutorOnline

Good Luck!
4
Big Data Processing using PySpark
CN7030 – Machine Learning on Big Data
Understanding Dataset: UNSW-NB151
The raw network packets of the UNSW-NB15 dataset was created by the IXIA PerfectStorm
tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for
generating a hybrid of real modern normal activities and synthetic contemporary attack
behaviours. Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This
data set has nine types of attacks, namely,
Fuzzers, Analysis, Backdoors, DoS, Exploits,
Generic, Reconnaissance, Shellcode and Worms
. The Argus and Bro-IDS tools are used,
and twelve algorithms are developed to generate totally 49 features with the class label.
a) The features are described
here.
b) The number of attacks and their sub-categories is described
here.
c) We use the total number of 10-million records that was stored in the CSV file
(
download). The total size is about 600MB. We use it for the machine learning task.
Task 1: Design and Build Classifiers using PySpark [60 marks]
1. Design one binary classifier to categorize the attack and the normal traffic data. Explain
your algorithm and its configuration. Follow the complete process of machine learning
involving the feature selection, preprocessing, class imbalance, etc.
[20 marks]
2. Design one Multi Classifier incorporating Ensemble techniques, i.e., bagging and
boosting, and explain shortly any parameter, configuration and processing.
[25 marks]
3. Measure and Compare the performance of both classifiers. Visualize your results and
findings using Python libraries.
[15 Marks]
Note:
A working solution without system/logical error is considered for full mark.
Task 2: Data Streaming [30 marks]
There are three different ways of data streaming methods in Spark: Discretized Streams
(DStreams), Window-based Computations, and Structured Streaming. You can apply one
of these methods to complete this task, as follows:
• Configure Spark environment based on your method.
• The incoming data/traffic should come up in the paragraph format (several lines).
• The task 1 is to count the number of words with the length of 5 or more in the oddnumbered lines.
1 https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
5
• The task 2 is to count only digits.
• At the end, visualize (use Python UI/plot libraries) and/or print out the results in the
predefined time slots.
Task 3: Documentation [10 marks]
Your final report must follow the “The format of final submission” section. Your work must
demonstrate appropriate understanding of building a user friendly, efficient and
comprehensive analytics report for a big data project to help move users (readers) around
to find the relevant contents.

  • Assignment status: Already Solved By Our Experts
  • (USA, AUS, UK & CA PhD. Writers)
  • CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS
QUALITY: 100% ORIGINAL PAPER – NO PLAGIARISM – CUSTOM PAPER

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • AMN403 Developing Market Intelligence Skills for Decision
  • From the time you first entered school many years ago, instructors have been measuring and evaluating you by imposing standards
  • Campbell Soup Company is an international provider of soup products.
  • Blockchain is an emerging technology of great importance in Finance, Economics and Accounting.
  • McDonald’s case study

Recent Comments

  • A WordPress Commenter on Hello world!

Archives

  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021

Categories

  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
©2022 Shark College | Powered by WordPress and Superb Themes!