gusucode.com > demos工具箱matlab源码程序 > demos/MeanMapReduceExample.m
%% Compute Mean Value with MapReduce % This example shows how to compute the mean of a single variable in a % data set using |mapreduce|. It demonstrates a simple use of |mapreduce| % with one key, minimal computation, and an intermediate state % (accumulating intermediate sum and count). % Copyright 1984-2014 The MathWorks, Inc. %% Prepare Data % Create a datastore using the |airlinesmall.csv| data set. This 12 % megabyte data set contains 29 columns of flight information for several % airline carriers, including arrival and departure times. In this example, % select |ArrDelay| (flight arrival delay) as the variable of interest. ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA'); ds.SelectedVariableNames = 'ArrDelay' %% % |tabularTextDatastore| returns a |TabularTextDatastore| object for the data. This % datastore treats |'NA'| strings as missing, and replaces the missing % values with |NaN| values by default. Additionally, the % |SelectedVariableNames| property allows you to work with only the % selected variable of interest, which you can verify using |preview|. preview(ds) %% Run MapReduce % The |mapreduce| function requires a mapper function and a reducer % function. The mapper function receives chunks of data and outputs % intermediate results. The reducer function reads the intermediate results % and produces a final result. %% % In this example, the mapper function finds the count and sum of the % arrival delays in each chunk of data. The mapper function then stores % these values as the intermediate values associated with the key % |'PartialCountSumDelay'|. %% % Display the mapper function file. type meanArrivalDelayMapper.m %% % The reducer function accepts the count and sum for each chunk stored by % the mapper function. It sums up the values to obtain the total count and % total sum. The overall mean arrival delay is a simple division of the % values. |mapreduce| only calls this reducer function once, since the % mapper function only adds a single unique key. The reducer function uses % |add| to add a single key-value pair to the output. %% % Display the reducer function file. type meanArrivalDelayReducer.m %% % Use |mapreduce| to apply the mapper and reducer functions to the % datastore, |ds|. meanDelay = mapreduce(ds, @meanArrivalDelayMapper, @meanArrivalDelayReducer); %% % |mapreduce| returns a datastore, |meanDelay|, with files in the % current folder. %% % Read the final result from the output datastore, |meanDelay|. readall(meanDelay)