COM739 Big Data and Infrastructure
Assignment Overview This assignment is designed to allow students to utilize the skills acquired during the module to analyse a particular dataset.
Description: For the project, students will be using the Anime dataset from Kaggle.com. The dataset contains two csv files, namely anime.csv and rating.csv. The file anime.csv provides information about several animated series like name, genre, type, etc. and rating.csv provides the ratings given to these anime by some users. Your task is to analyse this dataset and build a model to extract useful recommendations. In addition, your analysis should atleast answer the following questions:
1. Of all anime having atleast 1000 ratings, which anime has the maximum average rating? anime_id = 28977
2. How many anime with atleast 1000 ratings have an average rating greater than 9?
3. Which is the most watched anime i.e. the anime rated by most number of users?
4. What are the top three recommendations for the user with user_id 8086?
5. List top three users whom you would recommend the anime with anime_id 4935?
Students are allowed to use any techniques covered in the module. Implementation must be in python Google Collab notebook that utilizes Apache Spark. Suggested Format for the Report:
2. Approach a. Pre-processing: Describe any steps taken to clean and structure the dataset in order to carry out the proposed analysis. b. Algorithm
3. Results Assessment Criteria: This assessment is worth 50% of the module. The work of the student will be assessed upon: 1. Pre-processing steps. 2. Algorithm used to analyse the data. 3. Analysis conducted.
4. Description of the analysis.
5. Description of the result.
The marksheet and the rubric for this coursework are provided below.
Each student has to upload two components in their solution. The first component is the Google Colab containing the analysis script which has a weightage of 40%. The second component has a weightage of 60% and should be in one of the two formats below:
a. single column, maximum of 4 pages, Arial font with size 11 2. Video Recording: a. Maximum duration of 15 minutes
b. The idea behind video recording is to allow students to use presentation approaches like animations and other appropriate techniques.
c. The video should end with a list of citations used for the work. You are required to submit either a report or a video recording. Both items should be uploaded to Blackboard Learn. The report will be submitted through TurnitIn for plagiarism checks. All information used in the second components must be appropriately cited and referenced. Submission Process: The students are required to submit their solution in BBL .
Individual feedback will be provided within three weeks of submission date via Blackboard Learn. Separate links should be used for submitting the different components:
1. Google Colab Notebook: Use the link, named “Notebook”.
2. Report: Use the link, named “Report”.
3. Video Recording: Use the link, named “Video Recording”.
Note that you are required to submit either a report or a podcast. While uploading in BBL, name the uploaded items in the format B-Number_First-name_Last-name.File-extension. For example, if your B-Number is b0123456 and your name is John McLaine then your Colab should have the name “b0123456_john_mclaine.ipynb”.
Marksheet Marking Criteria Available Marks Marks Report / Recording (60%) Description of the problem and articulation of the approach used in building the model. 10 Description of the modelling technique used. 20 Evaluation of the model performance and presentation of the analysis and results. 30 Colab (40%) Evaluation Programming 30 Efficiency 10 Total Mark: /100 Comments COM739: Big Data and Infrastructure (2020-2021) Set Exercise Rubric Report / Recording CRITERIA Overview of the approach 1 (0-2) 2 (3-5) 3 (6-8) 4 (9-10) Description of the problem and articulation of the approach used in building the model. Does the report / recording describe the approach used for the given problem? There is no clear description provided for the approach. The problem and the approach taken are described briefly. A justification for the approach is provided briefly or is missing. It is difficult to interpret the reasons why the student has chosen this strategy. More detail on the justification of the strategy are needed. A good detailed description of the problem and the strategy chosen are provided. A clear justification for the approach is also provided. Only minor details are missing. Limited references made to literature sources for supporting the approach taken. The student provided an excellent description of the problem and the approach taken for analysis. The reasoning behind the approach is clearly outlined. It is obvious that the student clearly identified the suitable strategy based on relevant sources. 1 (0-3) 2 (4-10) 3 (11-17) 4 (18-20) Description of the modelling technique used. Does the report / recording demonstrate student understanding of the problem domain and ability to identify the appropriate techniques for modelling? Have all pre-processing steps been clearly described? The appropriate modelling techniques have been named but are not described. There is no explanation on how the approach was practically implemented or validated. The appropriate modelling techniques have been briefly described. The student briefly discussed the implementation approach; however, the explanation is vague and hard to interpret. The student demonstrates a good understanding of the appropriate modelling techniques required for this problem. The implementation of the proposed approach is explained but lacked minor details like how the approach was validated. The appropriate modelling techniques are fully explained. The complete implementation and validation process of the study has been fully explained. 1 (0-3) 2 (4-13) 3 (14-23) 4 (24-30) Evaluation of the model performance and presentation of the analysis and results. Has the student applied appropriate techniques to develop their own models? Limited effort has been made to evaluate the performance of the model and analyse it for extracting useful information. The student has briefly explained how the models performed, but more explanation is required for complete understanding. The student has described how the model provided and presented performance metrics of the models to support his/her conclusions. The student comprehensive studied the model performance and provided relevant metrics for this purpose. Does the student present the findings from these models in detail and explain the implications of the findings? Limited analysis and discussion is provided regarding findings from the model. Some effort was made for further analysis without clear justification. Further analysis was done to extract useful information from the model with proper justification as to how this information is useful. Colab 1 (0-3) 2 (4-13) 3 (14-23) 4 (24-30) Programming This criterion covers all aspects of analysis like pre-processing, analysis and visualization Code used for the analysis is incorrect or missing. The code for the analysis is present but some steps of the analysis are either missing or are not working. The code for the analysis is present and working as expected. Student conducted a comprehensive analysis and the code is working as described. Proper comments were added to the Colab for the understanding of the reader. 1 (0-2) 2 (3-5) 3 (6-8) 4 (9-10) Efficiency No steps were taken to improve the speed and memory consumed by the algorithm. Some effort was made to improve the speed and efficiency of the algorithm. Effort was made to improve the speed and memory of the algorithm. The code looking at alternate approaches is present in the Colab. Systematic effort was made to improve the speed and memory of the algorithm. Visualizations were provided offering insight into this analysis.
Business Justification Template
Date: [Select Date]
Table of Contents
Executive Summary.. 4
Business Opportunities. 5
Business Objectives. 5
Business Initiatives. 5
Business Processes. 5
Project description.. 6
Project Goals. 6
Key Deliverables. 6
Project Timeline. 6
Proposed Approach.. 6
Alternative Approaches. 6
Project Scope. 7
Project Out-of-Scope. 7
Project Financials. 8
Project Budget. 8
Total Cost of Ownership (TCO) 8
Dependancies, assumptions, Risks and issues. 10
Project Dependencies. 10
Project Assumptions. 10
Project Risks. 10
Project Issues. 11
Business sponsors and stakeholders. 12
Business Sponsors. 12
Project Stakeholders. 12
Appendix: Background Materials. 14
Authors and contributors. 14
|The content of the template is what is important rather than how is presented or formatted. Note: to delete any tip, such as this one, just click the tip text and then press the spacebar.|
|See page 40 of the BI Guidebook for a discussion of the reasons to obtain approval (hint: it is not just about the funding) and an outline of the key components that need to be included in a BI Justification. Chapter 2 discusses how to determine the business and technical needs of the BI project, assess its scope and create a preliminary plan and budget. This is then used as the input for the content of the BI Justification. Don’t miss the justification pitfalls on page 40.|
|Write this after completing the remainder of business justification document. The executive summary is the justification’s abstract and needs to convey its key elements. This section creates the first impression for many stakeholders and often establishes their overall project expectations.|
|The executive summary (about one full page) should include a brief summary of the following: Business Opportunities [5 marks]Key Deliverables [5 marks]Critical Success Factors (CSF) [5 marks]Key Project Metrics: Milestones, Costs, Resources, ROI [5 marks]|
|See the section “Building the Business Case” of the BI Guidebook (page 24) for a discussion on identifying business opportunities or problems that the BI project should address. In summary, there should be no shortage of business problems or opportunities that BI can address in an enterprise, but the key is to identify those that are worth the required investment of time, resources and cost. The key areas with the most potential for substantial business returns are: Strategic business initiatives with their underlying data needsCurrent business processes being hindered by analytical bottlenecks|
|Provide a brief summary in a business context of the business opportunities, problems and objectives that this BI project will address. [5 marks]|
|Briefly describe this BI project’s business objectives and the metrics to measure. [3 marks]|
|List the business initiative(s) that this BI project will support. Briefly describe the specific initiative deliverables or capabilities that the BI project will enable or enhance. [3 marks]|
|List the business processes that this BI project will support. Briefly describe the specific bottlenecks or constraints of the business processes that the BI project will address. [3 marks]|
|The purpose of this section is to describe what the project is and what it does.|
|Provide the title and a brief description of the BI project. [3 marks]|
|Briefly describe what will be created or enhanced by the BI project. [3 marks]|
|Provide a brief explanation of the project’s deliverables and resulting capabilities. The deliverables should include:|
|Provide a high-level project timeline listing the start date, completion date and key milestones. This can be summarized using table or Gantt chart. See pages 36-37 of the BI Guidebook. [3 marks]|
|Provide a brief explanation of the approach that will be undertaken in the project to achieve its goals and deliverables. The approach is the “how” of the project and includes a high-level description of the information, data, technology, and product architectures. [5 marks]|
|Provide a brief description of any alternatives that were discussed or evaluated. Briefly describe the specific initiative deliverables or capabilities that the BI project will enable or enhance. [3 marks]|
|Briefly describe the project’s scope in the context of the key attributes, listed on page 36 of the BI Guidebook, which were used to create the project plan, allocate resources, estimate costs and agree to deliverables. [3 marks]|
|Briefly describe what requirements or deliverables have been identified as being excluded from this project. These constraints have been agreed upon to get the project to be completed within a specific timeline and budget. [3 marks]|
|See the section “Developing Scope, Preliminary Plan and Budget” in Chapter 2 of the BI Guidebook for a discussion on developing a project budget and calculating a return on investment (ROI). Depending on the scope of the project and how an enterprise handles financially justifying a project, this section is often done in collaboration with a representative from the enterprise’s finance group. In that collaboration the BI would be responsible for identifying costs while the finance person would determine how those costs are represented in a budget and calculate ROI.|
|Provide a brief summary of the project costs, benefits and ROI. [4 marks]|
|Provide a listing of the project costs broken down by categories such as labor (internal versus external), hardware and software. These costs will further be classified as capital versus expense. Often a summary table is placed in this section and a spreadsheet is referenced for a more detailed analysis. [10 marks]|
Total Cost of Ownership (TCO)
|Describe the project TCO. Besides the initial project budget there may be recurring costs such as infrastructure or software subscriptions, as well as support and maintenance costs related to the project’s deliverables. These costs needed to be identified and approved along with the project’s budget which is a one-time cost. Some enterprises will include these costs in the ROI calculations. The finance person working on this section will determine how these costs are represented. [3 marks]|
|Identify and quantify ROI benefits, as discussed in both the “Building the Business Case” and “Calculating Benefits and ROI” sections of Chapter 2 in the BI Guidebook. [3 marks]|
DependEncies, assumptions, Risks and issues
|It is critical that the project team identifies the key dependencies, assumptions, risks and issues that may derail the project. Although the cynic may say this is the ultimate CYA (cover your ass) section, project management best practices demand that this section be well thought out and as expansive as possible. At a minimum, identifying these criteria enables a project to establish an early warning mechanism to flag and address conditions as they occur. With foresight, the project team may mitigate the risks.|
|Provide a brief description of project dependencies both within and outside of the enterprise. This needs to describe what project deliverables are impacted and the potential scope of the impact. [4 marks]|
|Provide a brief description of the assumptions that were made in project planning. These assumptions may include business conditions, technical capabilities, resource availability, resource skills or deployment conditions. [4 marks]|
|Explicitly identify these assumptions. If you don’t and they change, neither the project team nor their stakeholders will understand the impact on the project.|
|Provide a brief description of the project risks, how to identify them if they occur during the project and how you can reduce the risk. [4 marks]|
|Things that are out of the control of the project team can go wrong. The more risks that are identified prior to the project starting the more likely the project team and stakeholders will be able to deal with them.|
|Provide a brief description of the issues and concerns that the project team or stakeholders have identified. These issues may not rise to the level of project risks but are considered important enough to document. [3 marks]|
Business sponsors and stakeholders
|The are several sections in Chapter 2 of the BI Guidebook that discuss who the best sponsors are, how to enlist them and, maybe even more importantly, what sponsor actions present risks to the project.|
|List business owners of this requirement with their name, title, organizational group and project role. [2 marks]|
|List business owners of this requirement with their name, title, organizational group and project role. [2 marks]|
|It is a best practice to obtain sign-off approval of each project milestone. An organization’s policies and culture will dictate who the approvers will be, but typically the sponsors and key stakeholders are included. [2 marks]|
Appendix: Background Materials
|This section includes all background or supporting materials for the BI Justification. These materials may include more detailed analysis of the following: Project planProject BudgetROI Calculation|
Authors and contributors
|List all the people involved in obtaining and analyzing the requirements along with the authors of this deliverable.|
- Assignment status: Already Solved By Our Experts
- (USA, AUS, UK & CA PhD. Writers)
- CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS