The file FlightDelays.csv contains information on all commercial flights departed the Washington, D.C., area and arrived at New York during January 2004. For each flight there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and so on. The variable that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled. This assignment has three phases: A. Data Preprocessing 1. Data Reduction: Reduce the number of predictors using the necessary operation (domain knowledge, correlation matrix, etc.). Store the result of this step in a new file “FlightDelaysTestingData.csv” 2. Data Exploration stage: Make a copy to the “FlightDelaysTestingData.csv” file and rename it to “FlightDelaysDataExploration.csv” and use it to provide data summarization using four different Pivot tables to highlight different facts about the database. 3. Data Conversion: As some of the algorithms don’t comply with numerical data. The non-numerical data in the database is required to be converted. You need to provide a reference table to the transformed data. Page 2 of 2 B. Model Building Use the “FlightDelaysTestingData.csv” data file build models based on: 1. Naïve Bayes (NB) Model. 2. Classification and Regression Tree (CART): 3. Logistic Regression. Ideally, the above-mentioned algorithms should work with the following data types: Algorithm Output Type NB Categorical CART Both Log Reg Categorical C. Use Testing Data Make up five new rows (instances) of data and store them in a new file “FlightDelaysTestingData.csv”. Submissions: I. Report 1. Discuss and explain why such a predictor was removed/will be used in model building 2. Provide a reference table to the transformed data 3. Compare the results of the above built models and recommend an algorithm to be used for future prediction. 4. Use the best model to classify the data in “FlightDelaysTestingData.csv” file. II. Excel files Submit all the excel files you have used in this project: 1) FlightDelaysTrainingData.csv 2) FlightDelaysTestingData.csv that shows the classified data III. Weka based model files
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more
Recent Comments