Tuesday, December 11, 2007

How many strings can you intern?

Assuming a 32-bit OS and processor. Suppose we define INTERN to take a series of bytes with an upper limit on the number of bytes (say 128 bytes) and return an unsigned integer in [0, 2^32). Another function SYMBOL-NAME takes the unsigned integer and returns the associated bytes (or stuffs them in a buffer, or whatever). How many strings could we reasonably intern? Could we get to 2^32 without going to disk?

2 comments:

Preston L. Bannister said...

If the strings are not unique, yes. :)

Otherwise, no.

Joe Marshall said...

Indeed.

But to make the problem interesting, let's assume we're talking about unique strings. I think you ought to be able to intern over 2^32 of them because you ought to be able to compress the text down to a size that fits in RAM. (Assume the string is about the size of a lexical word in English).