gusucode.com > demos工具箱matlab源码程序 > demos/TSQRMapReduceExample.m

    %% Tall Skinny QR (TSQR) Matrix Factorization Using MapReduce
% This example shows how to compute a tall skinny QR (TSQR) factorization
% using |mapreduce|. It demonstrates how to chain MapReduce calls to
% perform multiple iterations of factorizations, and uses the |info|
% argument of the mapper function to compute numeric keys.

% Copyright 1984-2014 The MathWorks, Inc.

%% Prepare Data
% Create a datastore using the |airlinesmall.csv| data set. This 12
% megabyte data set contains 29 columns of flight information for several
% airline carriers, including arrival and departure times. In this example,
% the variables of interest are |ArrDelay| (flight arrival
% delay),|DepDelay| (flight departure delay) and |Distance| (total flight
% distance).
ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA');
ds.ReadSize = 1000;
ds.SelectedVariableNames = {'ArrDelay', 'DepDelay', 'Distance'}

%%
% |tabularTextDatastore| returns a |TabularTextDatastore| object for the data. This
% datastore treats |'NA'| strings as missing and replaces the missing
% values with |NaN| values by default. The |ReadSize| property lets you
% specify how to partition the data into chunks. Additionally, the
% |SelectedVariableNames| property allows you to work with only the
% specified variables of interest, which you can verify using |preview|.
preview(ds)

%% Chain |mapreduce| Calls
% The implementation of the multi-iteration TSQR algorithm needs to chain
% consecutive |mapreduce| calls. To demonstrate the general chaining design
% pattern, this example uses two MapReduce iterations. The output from the
% mapper function calls is passed into a large set of reducers, and then
% the output of these reducers becomes the input for the next MapReduce
% iteration.

%% First MapReduce Iteration
%%
% In the first iteration, the mapper function, |tsqrMapper|, receives one
% chunk (the ith) of data, which is a table of size $N_i\times 3$. The
% mapper computes the $R$ matrix of this chunk of data and stores it as an
% intermediate result. Then, |mapreduce| aggregates the intermediate
% results by unique key before sending them to the reducer function. Thus,
% |mapreduce| sends all intermediate $R$ matrices with the same key to the
% same reducer.
%
% Since the reducer uses |qr|, which is an in-memory MATLAB function, it's
% best to first make sure that the $R$ matrices fit in memory. This example
% divides the dataset into eight partitions. The |mapreduce| function reads
% the data in chunks and passes the data along with some meta information
% to the mapper function. The |info| input argument is the second input to
% the mapper function and it contains the read offset and file size
% information that are necessary to generate the key,
%
%    key = ceil(offset/fileSize/numPartitions).
%

%%
% Display the mapper function file.
type tsqrMapper.m

%%
% The reducer function receives a list of the intermediate $R$ matrices,
% vertically concatenates them, and computes the $R$ matrix of the
% concatenated matrix.

%%
% Display the reducer function file.
type tsqrReducer.m

%%
% Use |mapreduce| to apply the mapper and reducer functions to the
% datastore, |ds|.
outds1 = mapreduce(ds, @tsqrMapper, @tsqrReducer);

%%
% |mapreduce| returns an output datastore, |outds1|, with files in
% the current folder.

%% Second MapReduce Iteration
% The second iteration uses the output of the first iteration, |outds1|,
% as its input. This iteration uses an identity mapper function,
% |identityMapper|, which simply copies over the data using a single key,
% |'Identity'|.

%%
% Display the identity mapper function file.
type identityMapper.m

%%
% The reducer function is the same in both iterations. The use of a single
% key by the mapper function means that |mapreduce| only calls the reducer
% function once in the second iteration.

%%
% Display the reducer function file.
type tsqrReducer.m

%%
% Use |mapreduce| to apply the identity mapper and the same reducer to the
% output from the first |mapreduce| call.
outds2 = mapreduce(outds1, @identityMapper, @tsqrReducer);

%% View Results
% Read the final results from the output datastore.
r = readall(outds2);
r.Value{:}

%% Reference
% 
% # Paul G. Constantine and David F. Gleich. 2011. Tall and skinny QR
% factorizations in MapReduce architectures. In Proceedings of the Second
% International Workshop on MapReduce and Its Applications (MapReduce '11).
% ACM, New York, NY, USA, 43-50. DOI=10.1145/1996092.1996103
% <http://doi.acm.org/10.1145/1996092.1996103>