Note to self, quit job, read all of these.
Friday, August 31, 2007
Wednesday, August 29, 2007
Using Gmail to Relay Email
I'm preparing to move everything I host in house to save on monthly fees. My bandwidth needs are very low, but I need a lot of storage for home movies. Running a small linux server off my cable modem makes the most sense now.
Sending email from this server is a bit tricky, but with the help of some excellent tutorials, I've managed to tell postfix to relay all email through gmail. Google Apps For Your Domain is hosting my email, so this works out perfectly. (note: I'm only using regular gmail for my relay so far. I will try to use my google apps account soon.)
If you want to use gmail to relay email, check out the Gmail Relay Emails for Postfix on Redhat tutorial or the Gmail Relay Emails for Postfix on Ubuntu tutorial. Note that if you are running Ubuntu, you need to download the Thawte root SSL certificates, as outlined in a comment at that tutorial.
As a side note, I'm using DNS Park for my DNS hosting. DNS Park will host two free domains for you, and supports dynamic DNS updates. Dynamic DNS is crucial if you are on a cable modem or DSL. The excellent ddclient has a sample DNS Park configuration.
Sending email from this server is a bit tricky, but with the help of some excellent tutorials, I've managed to tell postfix to relay all email through gmail. Google Apps For Your Domain is hosting my email, so this works out perfectly. (note: I'm only using regular gmail for my relay so far. I will try to use my google apps account soon.)
If you want to use gmail to relay email, check out the Gmail Relay Emails for Postfix on Redhat tutorial or the Gmail Relay Emails for Postfix on Ubuntu tutorial. Note that if you are running Ubuntu, you need to download the Thawte root SSL certificates, as outlined in a comment at that tutorial.
As a side note, I'm using DNS Park for my DNS hosting. DNS Park will host two free domains for you, and supports dynamic DNS updates. Dynamic DNS is crucial if you are on a cable modem or DSL. The excellent ddclient has a sample DNS Park configuration.
Tuesday, August 28, 2007
Push or Pull? Stateless or Stateful?
So Bill says XMPP matters and is "pushing" a push model for message delivery.
Am curious, though, if this debate can be re-framed as Stateless or Stateful? A stateless messaging system would map to a Pull strategy, placing the burden on the client to actively poll or pull its messages. A stateful system would map to Push, where the server maintains a connection for every subscribed client.
Stateful systems are hard to scale over the internet. One reason is because there's a limit to the number of TCP connections I can maintain open at any one time. What's the limit? Not sure, probably OS and configuration specific, but is that a limit that I'll easily hit? If I'm not actively maintaining an open connection, can the system still be called Push?
An aside on Push vs Pull. Push might make for faster reacting systems, but I know that Pull is usually the way I want to process information. The more Push events I have, the less I get done, the less I can focus, and the more transient everything becomes. To get things done, I need to Pull information when I'm good and ready. So even if we've figured out how to scale Push and deploy it everywhere, the edge agents of mine will still buffer everything until I'm ready to Pull it.
So take that, Outlook email notification popups!
Am curious, though, if this debate can be re-framed as Stateless or Stateful? A stateless messaging system would map to a Pull strategy, placing the burden on the client to actively poll or pull its messages. A stateful system would map to Push, where the server maintains a connection for every subscribed client.
Stateful systems are hard to scale over the internet. One reason is because there's a limit to the number of TCP connections I can maintain open at any one time. What's the limit? Not sure, probably OS and configuration specific, but is that a limit that I'll easily hit? If I'm not actively maintaining an open connection, can the system still be called Push?
An aside on Push vs Pull. Push might make for faster reacting systems, but I know that Pull is usually the way I want to process information. The more Push events I have, the less I get done, the less I can focus, and the more transient everything becomes. To get things done, I need to Pull information when I'm good and ready. So even if we've figured out how to scale Push and deploy it everywhere, the edge agents of mine will still buffer everything until I'm ready to Pull it.
So take that, Outlook email notification popups!
Monday, August 27, 2007
The Gettysburg Powerpoint Presentation
What if Lincoln used Powerpoint? (file under How Not To Do It)
Of course, if you're looking for an example of how to do powerpoint presentation right, don't miss Dick Hardt giving the OSCON 2005 Keynote on Identity 2.0. That's how you do it.
Of course, if you're looking for an example of how to do powerpoint presentation right, don't miss Dick Hardt giving the OSCON 2005 Keynote on Identity 2.0. That's how you do it.
Thursday, August 16, 2007
e c30ac536947f7330943f8de9c33f70ef2d5994e7
e is a stack for the data web. Not only is this all in Ruby and uses RDF, but it's some of the most bare code I've seen in a while.
You had me at "data web".
And +10 for using the file system as a data store instead of a database.
You had me at "data web".
And +10 for using the file system as a data store instead of a database.
Labels:
data,
rdf,
semantic web
Monday, August 13, 2007
Too Much Data
Bill de hÓra writes that Phat Data is the challenge of the future. Couldn't agree more. My recent work with data warehouses certainly has shown me that managing and accessing terabytes of data is non trivial.
We've learned a few things, most importantly, "Denormalize and aggregate." Avoiding I/O is the most important step to take. And we've achieved some pretty decent performance numbers with a traditional relational database. However, as Bill points out, we're using it as a big indexed file system.
But having SQL and the numerous tools that support SQL has been critical to our success. I can't imagine solving these problems with proprietary tools. Sure, it's possible. Google did it, but they have more PhD's than you can shake a stick at. Plus some mega clusters.
While multi-core CPUs are a welcomed upgrade, what I really want is multi-spindle hard drives. Call me when I can emulate a google cluster in my desktop. What's lacking is a cheap and effective way to parallel my disk activity.
What would be really nice is to turn my corporate network into a giant compute farm utilizing both all those CPUs and all those hard drives. Now that is really turning the network into the computer. So, don't give me EC2, give me EC2 in my office. With everyone using their huge desktops just to read email and write PowerPoint, I know there's a ton of unused computing power. This is P2P with a purpose.
We've learned a few things, most importantly, "Denormalize and aggregate." Avoiding I/O is the most important step to take. And we've achieved some pretty decent performance numbers with a traditional relational database. However, as Bill points out, we're using it as a big indexed file system.
But having SQL and the numerous tools that support SQL has been critical to our success. I can't imagine solving these problems with proprietary tools. Sure, it's possible. Google did it, but they have more PhD's than you can shake a stick at. Plus some mega clusters.
While multi-core CPUs are a welcomed upgrade, what I really want is multi-spindle hard drives. Call me when I can emulate a google cluster in my desktop. What's lacking is a cheap and effective way to parallel my disk activity.
What would be really nice is to turn my corporate network into a giant compute farm utilizing both all those CPUs and all those hard drives. Now that is really turning the network into the computer. So, don't give me EC2, give me EC2 in my office. With everyone using their huge desktops just to read email and write PowerPoint, I know there's a ton of unused computing power. This is P2P with a purpose.
Labels:
database
Yes, database normalization is good
So InfoQ has collected a few blog posts which ask Data normalization, is it really that good?
Of course it's good, as long as you have requirements which dictate this optimization. If your application requires extremely fast writes, and this can happen in a heavy loaded OLTP system, then data normalization is your savior. If your application requires extremely fast reads, like OLAP systems, then of course data normalization is a killer.
These competing requirements are exactly why you have database systems optimized for either read or write. This is why large systems will maintain an operational system conforming to OLTP principles, and reporting systems conforming to OLAP principles.
Remember, traditional database systems are row oriented. This architecture is itself an optimization for OLTP and normalized data. Read mostly (or read only) systems can be column oriented, which organize the data on disk to optimize reads. For instance, Google's BigTable is an implementation of a column oriented database.
Of course it's good, as long as you have requirements which dictate this optimization. If your application requires extremely fast writes, and this can happen in a heavy loaded OLTP system, then data normalization is your savior. If your application requires extremely fast reads, like OLAP systems, then of course data normalization is a killer.
These competing requirements are exactly why you have database systems optimized for either read or write. This is why large systems will maintain an operational system conforming to OLTP principles, and reporting systems conforming to OLAP principles.
Remember, traditional database systems are row oriented. This architecture is itself an optimization for OLTP and normalized data. Read mostly (or read only) systems can be column oriented, which organize the data on disk to optimize reads. For instance, Google's BigTable is an implementation of a column oriented database.
Labels:
database
Wednesday, August 8, 2007
Tuesday, August 7, 2007
Saturday, August 4, 2007
Calculating Combinations In Ruby From Erlang
Well, thanks to the many people (here and here) that provided their versions of an erlang way to calculate combinations, I've really begun to open my mind to how to think functionally.
To help me understand what is going on, I've converted the basic idea into a Ruby version of calculation combinations. This uses recursion like the erlang versions do.
As you can see, I added a bit of erlangism to the Array class, by adding a method to get the head and tail of an array.
Let's run through this.
On the first call to
Another way to explain it might be:
To help me understand what is going on, I've converted the basic idea into a Ruby version of calculation combinations. This uses recursion like the erlang versions do.
class Array
def head_tail
[self.first, self.tail]
end
def tail
self[1,self.size-1]
end
end
def combos(list)
return [[]] if list.empty?
h, t = list.head_tail
t_combos = combos(t)
t_combos.inject([]) {|memo, obj| memo << [h] + obj} + t_combos
end
c = combos([1,2,3,4])
require 'pp'
pp c
As you can see, I added a bit of erlangism to the Array class, by adding a method to get the head and tail of an array.
Let's run through this.
On the first call to
combos([1,2,3,4]) we jump over the first line (the exit in our recursion). We generate the head and tail, which in this case is 1 and [2,3,4] respectively. We immediately begin our recursion by saying, "Get me all the combinations for the tail, which is essentially everything but the head." Then, for each of those combinations, we append the head (again, here it's 1). Finally, we add all of the rest of the combinations and return the array of arrays.Another way to explain it might be:
- H = Remove an element from the list.
- T = The rest of the list.
- C = Calculate the combinations of T (recursion happens here).
- C2 = For each combination in C, generate a new combination by appending H.
- return C2 + C
Labels:
combinations,
erlang,
Programming,
Ruby
Wednesday, August 1, 2007
Calculating Combinations the Erlang Way
If you recall, I wrote some Ruby code to calculate combinations of values in lists. I needed to create a list of all combinations of values, where each combination had between 0 and N number of values, where N is equal to length of the source list. (I'm not sure I'm explaining that correctly, but refer to my previous post for examples).
Here's my first shot at how to do this in erlang. It look longer to find
Brief explanation:
The
The
And then
And finally, to create the actual combinations, you will use
So, is there a more erlang way to do this?
Here's my first shot at how to do this in erlang. It look longer to find
math:pow and how to convert a float to an integer in erlang than to write the actual code.
-module(s).
-export([combos/1]).
combos(L) -> combos(L, bit_masks(length(L))).
combos(L, [BH|BT]) ->
[mask_list(L, BH)|combos(L, BT)];
combos(_, []) -> [].
mask_list([H|T], [BH|BT]) ->
case (BH) of
1 -> [H|mask_list(T, BT)] ;
0 -> mask_list(T, BT)
end;
mask_list([], []) -> [].
bit_masks(NumColumns) ->
bit_masks(0, round(math:pow(2, NumColumns))-1, NumColumns).
bit_masks(Max, Max, NumColumns) ->
[padl(NumColumns, bl(Max))];
bit_masks(X, Max, NumColumns) ->
[padl(NumColumns, bl(X)) | bit_masks(X+1, Max, NumColumns)].
padl(N, L) when N =:= length(L) -> L ;
padl(N, L) when N > length(L) -> padl(N, [0|L]).
bl(N) -> bl(N, []).
bl(0, Accum) -> Accum;
bl(N, Accum) -> bl(N bsr 1, [(N band 1) | Accum]).
Brief explanation:
The
bl function creates a "bit list", given a number and an accumulator. So, bl(5,[]) will return [1,0,1].The
padl function pads the list, which is useful when you want to ensure that all combinations ultimately have the same length. So, padl(4, [0]) would return [0,0,0,0].And then
bit_masks creates a list of bit masks which we'll use to create the combinations. For example, bit_masks(4) returns:
[[1,1,1,1],
[1,1,1,0],
[1,1,0,1],
[1,1,0,0],
[1,0,1,1],
[1,0,1,0],
[1,0,0,1],
[1,0,0,0],
[0,1,1,1],
[0,1,1,0],
[0,1,0,1],
[0,1,0,0],
[0,0,1,1],
[0,0,1,0],
[0,0,0,1],
[0,0,0,0]]
And finally, to create the actual combinations, you will use
combos. Example: combos([a,b,c,d]). This generates a result of:
[[],
[d],
[c],
[c,d],
[b],
[b,d],
[b,c],
[b,c,d],
[a],
[a,d],
[a,c],
[a,c,d],
[a,b],
[a,b,d],
[a,b,c],
[a,b,c,d]]
So, is there a more erlang way to do this?
Labels:
erlang,
Programming
Subscribe to:
Posts (Atom)