General information:
General rule : There may be some underspecified parts in the project description. This is on
purpose! In those cases, make your own design choices and document them!
You can choose from the following project types:
Software : Pick a dataset (see Resources below) and define a problem you want to solve.
Select as many data mining techniques that you would like to use in order to solve the problem, implement them from scratch , clean and analyze the data, compare the results from the different techniques, and present the findings. Alternatively, you may define a problem associated with data that can be obtained from a some website or web platform. Write a crawler that scrapes data from that platform, making sure that you respect all the crawling policies that the website has in place (usually looking at the robots.txt file of the website, or looking at their data policy). Choose the same number of data mining techniques that you want to use; If you have to do non-trivial implementation work for the crawling and the data preparation, it is OK if you implement one less technique.
Research : In this project type you can propose your own idea along the lines of your own
research or along the lines of improving the state of the art in solving an existing data mining problem. After you propose the idea, if there are any well-known techniques for that problem, you should use them as baselines, and you should propose a novel solution to that problem. You
must implement at least one method, as in the Software option. This project type can earn extra
credit.
Project Deliverables:
Project Proposal
Description :
In the proposal you must briefly but concisely introduce your project. In particular, you have to clearly define the problem your project proposes to solve. You should be able to distill the
essence of your proposal to a statement like:
Given <dataset, website, …> Use <data mining technique(s)> To <achieve “KDD outcome”>
For example:
Given Netflix data Use Collaborative Filtering algorithms To recommend new movies to users
or
Given Twitter data Use Matrix Factorization To detect fake followers
In special cases, you may be able to relax the above format for the problem statement, but it is
fairly generic and applies to a wide variety of problem statements. In any case, make sure you
define what problem you are going to solve, and very importantly, describe how you are
planning to evaluate your approach.
In addition to the above, make sure you include:
Depending on the project type you chose, you need to clearly describe your plan on obtaining
the data that you will use.
– Here is how I will find labeled data
– Given labeled data, here’s what I’ll do
– Without labeled data, here’s what I’ll do
The page limit for the proposal is 2 pages, single column.
Final Project Deliverable
Description :
The final project deliverable should include:
is possible. If the dataset comes with restrictions, there is no need to include it.
Details for the report:
Your final report should resemble a KDD paper (download the ACM “tight” format here
http://www.acm.org/publications/proceedings-template) and the page limit is 10 pages in double column format including the references.
For all project types you have to include 1) an Introduction where you describe and motivate
the problem, give an outline of your contributions and motivate your approach; if you have
Research you also have to argue that your proposed approach is sufficiently novel with respect
to the state-of-the-art, by providing statements on how existing methods do not adequately
address the problem you are solving., 2) a Related Work section where you outline relevant
papers that work on the same problem, a 3) Proposed Method section where you describe the
method(s) you used to solve the problem, 4) an Experimental Evaluation section where you
compare the methods used; if you have Research you have to further demonstrate that the
proposed approach outperforms the baselines (at least in some cases); this can earn extra
credit, and 5) a Discussion & Conclusions section where you draw the conclusions of your
paper and outline potential future research directions.
For the code , make sure you include:
Page limit: 5 pages + 1 for references (KDD-style double column format, ACM “tight” style)
Project Implementation
You need to implement one method. “Implementation” means writing the code for the method from scratch. For those implementations, you may use packages like Pandas, NumPy etc., but only for their basic functionality. You may not use an existing library implementation for your implementation.
If you find a website/tutorial/blog that outlines the implementation, you may use it as inspiration/guide but anything you submit must be your own implementation. Verbatim (or nearly verbatim) copies will not be allowed or tolerated (see the academic integrity section below).
There are some techniques for which, by exception, you may use existing implementations in
packages :
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more
Recent Comments