gusucode.com > demos工具箱matlab源码程序 > demos/MeanByGroupMapReduceExample.m
%% Compute Mean by Group Using MapReduce % This example shows how to compute the mean by group in a data set using % |mapreduce|. It demonstrates how to do computations on subgroups of data. % Copyright 1984-2014 The MathWorks, Inc. %% Prepare Data % Create a datastore using the |airlinesmall.csv| data set. This 12 % megabyte data set contains 29 columns of flight information for several % airline carriers, including arrival and departure times. In this example, % select |DayOfWeek| and |ArrDelay| (flight arrival delay) as the variables % of interest. ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA'); ds.SelectedVariableNames = {'ArrDelay', 'DayOfWeek'} %% % |tabularTextDatastore| returns a |TabularTextDatastore| object for the data. This % datastore treats |'NA'| strings as missing, and replaces the missing % values with |NaN| values by default. Additionally, the % |SelectedVariableNames| property allows you to work with only the % selected variables of interest, which you can verify using |preview|. preview(ds) %% Run MapReduce % The |mapreduce| function requires a mapper function and a reducer % function. The mapper function receives chunks of data and outputs % intermediate results. The reducer function reads the intermediate results % and produces a final result. %% % In this example, the mapper function computes the count and sum of delays % by the day of week in each chunk of data, and then stores the results as % intermediate key-value pairs. The keys are integers (1 to 7) representing % the days of the week and the values are two-element vectors representing % the count and sum of the delay of each day. %% % Display the mapper function file. type meanArrivalDelayByDayMapper.m %% % After the Map phase, |mapreduce| groups the intermediate key-value pairs % by unique key (in this case, day of the week). Thus, each call to the % reducer function works on the values associated with one day of the week. % The reducer function receives a list of the intermediate count and sum of % delays for the day specified by the input key (|intermKey|) and sums up % the values into the total count, |n| and total sum |s|. Then, the reducer % function calculates the overall mean. and adds one final key-value pair % to the output. This key-value pair represents the mean flight arrival % delay for one day of the week. %% % Display the reducer function file. type meanArrivalDelayByDayReducer.m %% % Use |mapreduce| to apply the mapper and reducer functions to the % datastore, |ds|. meanDelayByDay = mapreduce(ds, @meanArrivalDelayByDayMapper, ... @meanArrivalDelayByDayReducer); %% % |mapreduce| returns a datastore, |meanDelayByDay|, with files in the % current folder. %% % Read the final result from the output datastore, |meanDelayByDay|. result = readall(meanDelayByDay) %% Organize Results % The integer keys (1 to 7) represent the days of the week. To organize the % results more, convert the keys to a categorical array, retrieve the % numeric values from the single element cells, and rename the variable % names of the resulting table. result.Key = categorical(result.Key, 1:7, ... {'Mon','Tue','Wed','Thu','Fri','Sat','Sun'}); result.Value = cell2mat(result.Value); result.Properties.VariableNames = {'DayOfWeek', 'MeanArrDelay'} %% % Sort the rows of the table by mean flight arrival delay. This reveals % that Saturday is the best day of the week to travel, whereas Friday is % the worst. result = sortrows(result,'MeanArrDelay')