My test concerns reading a tab delimited text file, tokenizing it, and converting any numbers into integers. I've split the program into two conceptual parts: 1) the file IO and line reading, and 2) the handling of the line. I wanted to test the performance differences between using a fun() for the line handling vs just including the line handling code directly.
My test text file is 10037355 lines long and 1028071833 bytes big. I compiled my code using HIPE.
The quick answer is, using a fun() is slightly slower than not using it (which is to be expected). For my particular test, using a fun() was approximately 20% slower.
| test | Run 1 (sec) | Run 2 (sec) |
|---|---|---|
| fun() | 484.631 | 485.380 |
| without fun() | 404.017 | 403.632 |
Here's the code. I stole the erlang timing functions from David King (thanks David!).
Using a fun()
time_takes(Mod,Fun,Args) ->
Start=erlang:now(),
Result = apply(Mod,Fun,Args),
Stop=erlang:now(),
io:format("~p~n",[time_diff(Start,Stop)]),
Result.
time_diff({A1,A2,A3}, {B1,B2,B3}) ->
(B1 - A1) * 1000000 + (B2 - A2) + (B3 - A3) / 1000000.0 .
handle_line(Line, SplitOn) ->
L = string:tokens(string:strip(Line, both, $\n), " "),
{Dimensions, Measures} = lists:split(SplitOn, L),
lists:map(fun(X) -> {I,_} = string:to_integer(X), I end, Measures).
process_file(Filename, Proc) ->
{ok, File} = file:open(Filename, read),
process_lines(File, Proc, 0).
process_lines(File, Proc, LineNum) ->
case io:get_line(File, '') of
eof -> file:close(File);
Line ->
Proc(Line),
process_lines(File, Proc, LineNum + 1)
end.
Including Code Directly (no fun())
(I'm just showing the difference.)
process_lines(File, Proc, LineNum) ->
case io:get_line(File, '') of
eof -> file:close(File);
Line ->
L = string:tokens(string:strip(Line, both, $\n), " "),
{Dimensions, Measures} = lists:split(10, L),
lists:map(fun(X) -> {I,_} = string:to_integer(X), I end, Measures),
process_lines(File, Proc, LineNum + 1)
end.
Of course, it's easy to argue that Erlang isn't the best language for string manipulation. But this part of the application is hardly the bottleneck, so I'm willing to take the bloat in order to take advantage of the concurrency later on.
Next up, I'll do timing experiments testing if tail recursion speeds anything up.
2 comments:
[...] Semergence Semantic Web, Ruby on Rails, and Massive Data « Erlang Fun Results [...]
[...] Semergence Semantic Web, Ruby on Rails, and Massive Data « Erlang Fun Results [...]
Post a Comment