ongoing by Tim Bray

ongoing fragmented essay by Tim Bray

Workflows in AWS and GCP 21 Sep 2020, 12:00 pm

Recently, Google launched a beta of Google Cloud Workflows. This grabs my attention because I did a lot of work on AWS Step Functions, also a workflow service. The differences between them are super interesting, if you’re among the handful of humans who care about workflows in the cloud. For those among the other 7.8 billion, move right along, nothing to see here.

Context

The Google launch seemed a bit subdued. There was an August 27th tweet by Product Manager Filip Knapik, but the real “announcement” is apparently Serverless workflows in Google Cloud, a 16-minute YouTube preso also by Knapik. It’s good.

On Twitter someone said Good to see “Step Functions / Logic Apps” for GCP! and I think that’s fair. I don’t know when Logic Apps launched, but Step Functions was at re:Invent 2016, so has had four years of development.

I’m going to leave Logic Apps out of this discussion, and for brevity will just say “AWS” and “GCP” for AWS Step Functions and Google Cloud Workflows.

AWS Step FunctionsGoogle Cloud Workflows

Docs

For GCP, I relied on the Syntax reference, and found Knapik’s YouTube useful too. For AWS, I think the best starting point is the Amazon States Language specification.

The rest of this piece highlights the products’ similarities and differences, with a liberal seasoning of my opinions.

YAML vs JSON

AWS writes workflows (which it calls “state machines”) in JSON. GCP uses YAML. Well… meh. A lot of people prefer YAML; it’s easier to write. To be honest, I always thought of the JSON state machines as sort of assembler level, and assumed that someone would figure out a higher-level way to express them, then compile down to JSON. But that hasn’t happened very much.

I have this major mental block against YAML because, unlike JSON or XML, it doesn’t have an end marker, so if your YAML gets truncated, be it by a network spasm or your own fat finger, it might still parse and run — incorrectly. Maybe the likelihood is low, but the potential for damage is apocalyptic. Maybe you disagree; it’s a free country.

And anyhow, you want YAML Step Functions? You can do that with serverless.com or in SAM (see here and here).

Control flow

Both GCP and AWS model workflows as a series of steps; AWS calls them “states”. They both allow any step to say which step to execute next, and have switch-like conditionals to pick the next step based on workflow state.

But there’s a big difference. In GCP, if a step doesn’t explicitly say where to go next, execution moves to whatever step is on the next line in the YAML. I guess this is sort of idiomatic, based on what programming languages do. In AWS, if you don’t say what the next step is, you have to be a terminal success/fail state; this is wired into the syntax. In GCP you can’t have a step named “end”, because next: end signals end-of-workflow. Bit of a syntax smell?

I’m having a hard time developing an opinion one way or another on this one. GCP workflows can be more compact. AWS syntax rules are simpler and more consistent. [It’s almost as if a States Language contributor was really anal about minimalism and syntactic integrity.] I suspect it maybe doesn’t make much difference?

GCP and AWS both have sub-workflows for subroutine-like semantics, but GCP’s are internal, part of the workflow, while AWS’s are external, separately defined and managed. Neither approach seems crazy.

The work in workflow

Workflow engines don’t actually do any work, they orchestrate compute resources — functions and other Web services — to get things done. In AWS, the worker is identified by a field named Resource which is syntactically a URI. All the URIs currently used to identify worker resources are Amazon-flavored ARNs.

GCP, at the moment, mostly assumes an HTTP world. You can specify the URL, whether you want to GET/POST/PATCH/DELETE (why no PUT?), lets you fill in header fields, append a query to the URI, provide auth info, and so on. Also, you can give the name of a variable where the result will be stored.

I said “mostly” because in among all the call: http.get examples, I saw one call: sys.sleep, so the architecture allows for other sorts of things.

GCP has an advantage, which is that you can call out to an arbitrary HTTP endpoint. I really like that: Integration with, well, anything.

There’s nothing in the AWS architecture that would get in the way of doing this, but while I was there, that feature never quite made it to the top of the priority list.

That’s because pre-built integrations seemed to offer more value. There are a lot of super-useful services with APIs that are tricky to talk to. Some don’t have synchronous APIs, just fire-and-forget. Or there are better alternatives than HTTP to address them. Or they have well-known failure modes and workarounds to deal with those modes. So, AWS comes with pre-cooked integrations for Lambda, AWS Batch, DynamoDB, ECS/Fargate, SNS, SQS, Glue, SageMaker, EMR, CodeBuild, and Step Functions itself. Each one of these takes a service that may be complicated or twitchy to talk to and makes it easy to use in a workflow step.

The way it works is that the “Resource” value is something like, for example, arn:aws:states:::sqs:sendMessage, which means that the workflow should send an SQS message.

AWS has a couple of other integration tricks up its sleeve. One is “Callback tasks”, where the service launches a task, passes it a callback token, and then pauses the workflow until something calls back with the token to let Step Functions know the task is finished. This is especially handy if you want to run a task that involves interacting with a human.

Finally, AWS has “Activities”. These are worker tasks that poll Step Functions to say “Got any work for me?” and “Here’s the result of the work you told me about” and “I’m heartbeating to say I’m still here”. These turn out to have lots of uses. One is if you want to stage a fixed number of hosts to do workflow tasks, for example to avoid overrunning a relational database.

So at the moment, AWS comes with way more built-in integrations and ships new ones regularly. Having said that, I don’t see anything in GCP’s architecture that prevents it eventually taking a similar path.

The tools for specifying what to do are a little more stripped-down in AWS: “Here’s a URI that says who’s supposed to do the work, and here’s a blob of JSON to serve as the initial input. Please figure out how to start the worker and send it the data.” [It’s almost as if a States Language contributor was a big fan of Web architecture, and saw URIs as a nicely-opaque layer of indirection for identifying units of information or service.]

Reliability

This is one of the important things a workflow brings to the table. In the cloud, tasks sometimes fail; that’s a fact of life. You want your workflow to be able to retry appropriately, catch exceptions when it has to, and reroute the workflow when bad stuff happens. Put another way, you’d like to take every little piece of work and surround it with what amounts to a try/catch/finally.

GCP and AWS both do this, with similar mechanisms: exception catching with control over the number and timing of retries, and eventual dispatching to elsewhere in the workflow. GCP allows you to name a retry policy and re-use it, which is cool. But the syntax is klunky.

AWS goes to immense, insane lengths to make sure a workflow step is never missed, or broken by a failure within the service. (The guarantees provided by the faster, cheaper Express Workflows variant are still good, but weaker.) I’d like to see a statement from GCP about the expected reliability of the service.

Parallelism

It’s often the case that you’d like a workflow to orchestrate a bunch of its tasks in parallel. AWS provides two ways to do this: You can take one data item and feed it in parallel to a bunch of different tasks, or you can take an array and feed its elements to the same task. In the latter case, you can limit the maximum concurrency or even force one-at-a-time processing.

GCP is all single-threaded. I suppose there’s no architectural reason it has to stay that way in future.

Workflow state in GCP

Now here’s a real difference, and it’s a big one. When you fire up a workflow (AWS or GCP), you feed data into it, and as it does work, it builds up state, then it uses the state to generate output and make workflow routing choices.

In GCP, this is all done with workflow variables. You can take the output of a step and put it in a variable. Or you can assign them and do arithmetic on them, like in a programming language. So you can build a loop construct like so:

- define:
    assign:
        - array: ["foo", "ba", "r"]
        - result: ""
        - i: 0
- check_condition:
    switch:
        - condition: ${len(array) > i}
            next: iterate
    next: exit_loop
- iterate:
    assign:
        - result: ${result + array[i]}
        - i: ${i+1}
    next: check_condition
- exit_loop:
    return:
        concat_result: ${result}  

Variables are untyped; can be numbers or strings or objects or fish or bicycles. Suppose a step’s worker returns JSON. Then GCP will parse it into a multi-level thing where the top level is an object with members named headers (for HTTP headers) and body, and then the body has the parsed JSON, and then you can add functions on the end, so you can write incantations like this:

- condition: ${ userRecord.body.fields.amountDue.doubleValue == 0 }

(doubleValue is a function not a data field. So what if I have a field in my data named doubleValue? Ewwww.)

Then again, if a step worker returns PDF, you can stick that into a variable too. And if you call doubleValue on that I guess that’s an exception?

Variable names are global to the workflow, and it looks like some are reserved, for example http and sys.

…and in AWS

It could hardly be more different. As of now AWS doesn’t use variables. The workflow state is passed along, as JSON, from one step to the next as the workflow executes. There are operators (InputPath, ResultPath, OutputPath, Parameters) for pulling pieces out and stitching them together.

Just like in GCP, JSONPath syntax is used to pick out bits and pieces of state, but the first step is just $ rather than the name of a variable.

There’s no arithmetic, but then you don’t need to do things with array indices like in the example above, because parallelism is built in.

If you want to do fancy manipulation to prepare input for a worker or pick apart one’s output, AWS takes you quite a ways with the built-in Pass feature; but if you want to get fancy and run real procedural code, you might need a Lambda to accomplish that. We thought that was OK; go as far as you can declaratively while remaining graceful, because when that breaks down, this is the cloud and these days clouds have functions.

While I haven’t been to the mat with GCP to do real work, at the moment I think the AWS approach is a winner here. First of all, I’ve hated global variables — for good reason — since before most people reading this were born. Second, YAML is a lousy programming language to do arithmetic and so on in.

Third, and most important, what happens when you want to do seriously concurrent processing, which I think is a really common workflow scenario? GCP doesn’t really have parallelism built-in yet, but I bet that’ll change assuming the product gets any traction. The combination of un-typed un-scoped global variables with parallel processing is a fast-track to concurrency hell.

AWS application state is always localized to a place in the executing workflow and can be logged and examined as you work your way through, so it’s straightforward to use as a debugging resource.

[It’s almost as if a States Language contributor was Functional-Programming literate and thought immutable messages make sense for tracking state in scalable distributed systems.]

GCP variables have the virtue of familiarity and probably have less learning curve than the AWS “Path” primitives for dealing with workflow state. But I just really wouldn’t want to own a large mission-critical workflow built around concurrent access to untyped global variables.

Pricing

GCP is cheaper than AWS, but not cheaper than the Express Workflows launched in late 2019.

Google Cloud Workflows pricing

It’s interesting but totally unsurprising that calling out to arbitrary HTTP endpoints is a lot more expensive. Anyone who’s built general call-out-to-anything infrastructure knows that it’s a major pain in the ass because those calls can and will fail and that’s not the customer’s problem, it’s your problem.

Auth

This is one area where I’m not going to go deep because my understanding of Google Cloud auth semantics and best practices is notable by its absence. It’s nice to see GCP’s secrets-manager integration, particularly for HTTP workers. I was a bit nonplused that for auth type you can specify either or both of OIDC and Oauth2; clearly more investigation required.

UX

I’m talking about graphical stuff in the console. The GCP console, per Knapik’s YouTube, looks clean and helpful. Once again, the AWS flavor has had way more years of development and just contains more stuff. Probably the biggest difference is that AWS draws little ovals-and-arrows graphic renditions of your workflow, colors the ovals as the workflow executes, and lets you click on them to examine the inputs and outputs of any workflow execution. This is way more than just eye candy; it’s super helpful.

Which workflow service should you use?

That’s dead easy. You should use a declaratively-specified fully-managed cloud-native service that can call out to a variety of workers, and which combines retrying and exception handling to achieve high reliability. And, you should use the one that’s native to whatever public cloud you’re in! Like I said, easy.

Slow Drone Soar 13 Sep 2020, 12:00 pm

I recently invited you to read a thousand-page novel without much in the way of sentences, so I think it’s perfectly reasonable to point you at a 69-minute drone-metal album that largely lacks melody and rhythm. I refer to Life Metal, a 2019 release from Sunn O))). Because in 2020 we really ought to be sharing good things with each other, and this is a good thing; the new music I’ve enjoyed most this year.

Sunn O)))

Lots of people probably don’t think they’d like drone metal, just from the name. I think they might want to think again. This is extremely serious music and works both as background and for careful listening. I like the tones and the chords and the treatments, but what I like most is that the best of this kind of music, including this album, is full of serenity.

Life Metal lives in a tradition that’s almost exactly fifty years old, measuring from the release in February 1970 of Black Sabbath’s debut album Black Sabbath. Its opening song (also Black Sabbath), after 37 seconds of church bells in a thunderstorm, opens with a huge, rumbling, low-string tritone riff, and that moment is generally considered to mark the birth of heavy metal. This music is like that, boiled down to the purest essence.

Enough general rambling about metal, because I had great fun with an ongoing piece on the subject a couple of years ago; it includes groovy pix.

Life Metal by Sunn O))) album cover

One quarter of the Life Metal LP cover. The graphic, lifted from their Bandcamp site, is posted in only one size: 666x666. Because metal.

The music

It sounds like other Sunn O))) music only, well, brighter. (Which isn’t actually bright since this music is supposed to be dark.) A little research reveals that Anderson and O’Malley, Sunn O)))’s leaders, have been enjoying raising young families and feeling generally upbeat about things. Thus the name Life Metal, obvious “Death Metal” counterpoint and apparently a bit of a running joke here and there in the metal community.

A few words for those who are new to the genre: It’s a slow-shifting landscape of huge low-pitched high-distortion electric guitar chords, with occasional interjections of voice, other instruments, and rare (but very dramatic) high-note sequences. I personally find the melodious downtuned guitar roar immensely pleasing, and enjoy the vast sense of space; there’s no hurry anywhere and the riffs are lingered over endlessly for their own sake. It’s designed to be played absurdly, crushingly loud so it’s a whole-body experience.

Life Metal seasons the guitar noise with occasional pipe-organ and an actual song that’s sung not shrieked, lyrics translated from ancient Aztec verse.

The sound

Here’s where it gets special. For this outing the band teamed up with legendary producer Steve Albini and used entirely analogue technology, which you can experience end-to-end if you buy the vinyl version, which I did of course.

When I listen to it in the car or the boat/office, it sounds good and I enjoy it. But a couple of times I’ve pulled up a chair in front of the big speakers and put Life Metal on the record player, turned it up pretty far and… wow.

I don’t know what Albini did, but suddenly the whole room was full of gigantic musical serpents dancing slow, scales glowing blackly, rays of dark musical fire radiating in every direction. The music felt endless in scope and width and height and length and volume. Remarkable music, remarkably performed, remarkably produced, remarkably delivered; so intense.

No, I wasn’t doing any drugs! But if I’d had any I’d have been tempted.

Long Links 3 Sep 2020, 12:00 pm

I seem to have fallen into a monthly rhythm of posting pointers to what I think are high-quality long-form pieces. One of the best things about not having a job as such is that I have time to read these things. My assumption is that most of you don’t, but that maybe one or two will reward an investment of your limited time.

In The New Yorker, Jane Hu writes The Second Act of Social-Media Activism, subtitled “Has the Internet become better at mediating change?” People like me would like to believe this; back when the Arab Spring was first a thing, we did. Now, that belief is weak; maybe the Internet is a better vehicle for fascism than progress. Even endless live video of police violence doesn’t seem to build support for a common-sense redesign of policing which would route a lot less money to bossy people with guns and more to good listeners who are qualified at dealing with social-health issues. Hu’s take is level-headed and not entirely pessimistic.

These days, I subscribe to Utility Dive, which mixes arcane expositions of utility-regulation politics with the occasional really fresh and smart take on energy economics in the face of the onrushing climate emergency. Sheep, ag and sun: Agrivoltaics propel significant reductions in solar maintenance costs is probably not terribly important to your understanding of the big picture, but it’s a fun read. Suppose you have a solar farm and it’s not in the desert, it’s in a place where plants grow. Well, they might grow up over your panels and get in the way of the sun. What do you do about that? Well, you treat your farm like a farm and bring in grazing animals to eat the plants. Turns out you have to pay the animal providers, which as a farm boy feels odd to me. Now, when there are untended sheep, there are pretty soon going to be predators. You might be able to afford sheep but nobody can afford shepherds, so instead you might hire some Great Pyrenees hounds to tend them. Then you might find yourself with a liability problem when the hounds (they are very protective of their woolly charges) mistake a passerby for a predator. I wonder what solar farms of the future are going to end up looking like?

Back to The New Yorker, where Bill McKibben has been leading the climate-emergency charge — they’ve a great newsletter you can sign up for. McKibben wrote North Dakota Oil Workers Are Learning to Tend Wind Turbines—and That’s a Big Deal, which is good. A lot of people are pointing out that a Green New Deal program would offer a lot of major investment opportunities — there’s gold in them thar renewables — but it looks like petroleum-engineering skills are going to be transferable to the sector. Which is a damn good thing, because capital investment on the oil-company front is falling like a stone and I’m not seeing evidence that it’s coming back any time soon. The picture is complicated because opponents of the transition from fossil to renewable energy include not just oil barons but certain old-school unions. But there are grounds for optimism.

Speaking of oil-economy woes, the CBC’s As oil money dries up, Alberta's financial woes laid bare paints a pretty painful picture of the situation in Western Canada, where employment levels and provincial budgets have been joined at the hip to the oil industry since forever. I’m optimistic in the medium/long term but this transition isn’t going to be easy. If you doubt that, dip into the 4,709 comments. No, I take that back, please don’t.

I forget what maze of twisty little passages led me to The Logical Description of an Intermediate Speed Digital Computer, which is Gene Amdahl’s 1951 Ph.D thesis. If you don’t know who Gene Amdahl was, skip to the next paragraph, this is about to get very boring. If you do know, it’s almost certain this read will delight you. The computer in question was actually built; it was called the Wisconsin Integrally Synchronized Computer. Its only memory was a storage drum. Anyhow, this is instructive in that it helps the reader understand how many concepts and constructs that seem axiomatic to us, self-evidently obvious, were hard-won by these people in this phase of history. Also, this computer doesn’t have instructions, it has commands. I’m kind of sorry that language didn’t stick.

It turns out that the backbone of the Internet is mostly operated by large telephone companies who, as corporations go, have a reputation for being clueless, abusive, and extractive. This seems unsatisfactory. In A Public Option for the core (ACM overview page, PDF), Harchol, Bergemann et al “propose the creation of a ‘public option’ for the Internet’s core backbone. This public option core, which complements rather than replaces the backbones used by large-scale ISPs, would (i) run an open market for backbone bandwidth so it could leverage links offered by third-parties, and (ii) structure its terms-of-service to enforce network neutrality so as to encourage competition and reduce the advantage of large incumbents.” This sounds profoundly sensible to me.

This is a space for long-form works and I didn’t say they had to be written. In that spirit, I recommend Nine Inch Nails Tension 2013, an 87-minute recording captured at the Staples Center in LA on November 8, 2013. If you like hardass, totally committed musical performances, don’t start watching this or you won’t get back to “real life” for a while. Which, especially in 2020, is not a bad thing.

In Catalyst (of which I know nothing) from last spring is Ecological Politics for the Working Class. If you’re terribly concerned about the climate emergency (as I am) and also a progressive who flirts with class reductionism (as I am) it probably bothers you that, and I quote, “environmentalism’s base in the professional-managerial class and focus on consumption has little chance of attracting working-class support.” So, a piece that “argues for a program that tackles the ecological crisis by organizing around working-class interests” should interest you. Tl;dr: Among other things, stop yelling at consumers and try to get away from “lifestyle environmentalism”. Related: “centrist” pundits damning the Green New Deal idea with faint praise along the lines “those environmental goals are laudable, but they they start talking about guaranteed incomes and so on, which really aren’t a necessary part of the package.” Er, wrong, they really are a totally necessary part. I don’t agree with everything here but it’s a bracing read.

When Biden picked Kamala Harris as his running mate, she suddenly became a lot more interesting. A fierce controversy broke out over in Wikipedia about how to describe her; was “African-American” appropriate? To discover the outcome, check out her Wikipedia entry. In The Atlantic, Joshua Benton published The Wikipedia War That Shows How Ugly This Election Will Be. I dunno if we needed any more educating about more 2020 ugliness, but reading this made me happy. Because it shone a light on the Wikipedia work process, which these is terribly important to humanity’s understanding of reality. And you know what? While imperfect, on balance it works pretty fucking well. The process is unironically concerned with truth and does, on balance, a good job of achieving it. Is anything more important on today’s Internet? I’ve highlighted a core Wikipedia tenet before and probably will again: “Content which is not verifiable will eventually be removed.” Which seems a necessary but not sufficient condition for any sort of sane adult discussion about anything.

Back to Utility Dive: Renewable energy prices begin an upward trend, LevelTen data shows. No, I don’t know who LevelTen is. This is notable because renewable prices have been falling quickly and monotonically for years; I was shocked at the headline. Well, it turns out that the various tax incentives and other subsidies that originally helped drive renewable adoption are by and large no longer necessary, so they’re expiring and being withdrawn. So the prices go up a bit. Does it mean that renewable generation is now more expensive than fossil fuel? Nope, not even close. But watch out for petrol-head trolls exclaiming with glee, using this as evidence. Oh, another piece from the Dive: The future of hydropower will be determined in the Pacific Northwest, which covers the complicated and interesting conflict, when you dam rivers for generation, between the benefits of cheap green power and the potential damage to fish migration. Something that can’t be ignored.

In Wired, Yiren Lu writes My Week of Radical Transparency at a Chinese Business Seminar, a deep dive into a part of mainland-Chinese culture which I previously had no notion of. China in some regards is still the most interesting place in the world and people who are interested in the world need to improve their understanding of what’s going on there. This piece is going to make some of us a little uncomfortable with one or two progressive axioms over on this side of the Pacific, too.

Speaking of which: There’s so much bad shit going on the world that it’s easy to let the news about China’s brutally racist oppression of its Uighur population vanish in the input stream. Buzzfeed is doing its best to help us not let that happen with a two-part investigation starting here: “China rounded up so many Muslims in Xinjiang that there wasn’t enough space to hold them”. They have rare testimony from inside the camps. This won’t cheer you up but it’ll give you more reasons to understand (and, realistically, fear) China’s barbaric ruling clique.

Here, in Gizmodo, is some practical advice for protestors: Your Phone Is a Goldmine of Hidden Data for Cops. Here's How to Fight Back. It’s exactly what the headline says. The measures recommended are pretty extreme and for most of us in most political actions, thankfully probably unnecessary. Let’s hope it stays that way. And bookmark this in case it doesn’t.

Over at AdWeek, Why Lawmakers Are Keeping Ad Tech Under Such Close Scrutiny. I think this particular “why” is pretty obvious: Because Ad Tech is abusive to customers and disastrous for many previously-excellent publishing business models. It’s tremendously damaging to privacy and to intellectual discourse. People who are already privacy paranoids won’t be surprised by much of the information here, but I found it super interesting because it was presented in the language of, and reflects the culture of, the Ad industry. Related: Apple wants to stop advertisers from following you around the web. Facebook has other ideas from Peter Kafka over on Vox. I sincerely hope for lots of political action in the nearest-possible future around abusive AdTech.

Still more on privacy and its abusers: How To Destroy Surveillance Capitalism by Cory Doctorow over on Medium/One Zero. It’s really long — in fact, the full text of Cory’s new book — and I haven’t finished reading it, but it feels essential to me and I will.

In Canada’s The Tyee, a review of Kurt Andersen’s recent book Evil Geniuses: The Unmaking of America. The goal is to draw a map explaining how the USA got from the Seventies, when the level of inequality was roughly the same as Sweden’s, to today’s diseased, shambling, divided, misgoverned, year of discontent. The review dips into our local Western-Canada politics and that may not be of much general interest, but reading it definitely put the book on my to-buy-and-read list.

Last of all, I recommend The Internet of Beefs by Venkatesh Rao, an appallingly cynical and amusing take on the dysfunction of Internet discourse. I found a lot of truth in it. It doesn’t propose much in the way of solutions, but says “Like all the best questions, this one is at once intensely practical — all about digital hygiene and how to design and use devices of connection to think — and intensely philosophical — about finding ways to be reborn without literally dying. I don’t have answers, but I like that I finally at least have a question.” A very good question.

Studying Water 3 Sep 2020, 12:00 pm

I have the good fortune to live in a seafront city, the further fortune to travel often by boat to a cabin by ocean’s edge, and still further, to work in a boat/office, many hours a week within arm’s reach of salt water. The water exhibits mysteries and tells me things I don’t understand but would love, given another lifetime, to study.

The sea’s surface is never uniform, always patterned. I assume the patterns reflect the effects of wind and tide, but not how, nor what message they’re trying to send. I’m pretty sure that the motorboats that crisscross the open water leave long-lasting traces on the surface patterns. I’d like to know.

Port Phillip BayHowe Sound

Above: Port Phillip Bay.
Below: Howe Sound.

You can feel, not just see, the surface textures. At any noticeable speed the boat’s progress is unsmooth. But the the flavors of roughness are infinite, combining aspects of chop, sway, hill-climbing, surfing, and blunt impact. When you cross one of those visual borders on the surface the quality of ride changes too. Sometimes looking ahead I can even predict correctly, finding smoother water when it’s uncomfortably rough. But I don’t know how the waves get that way, nor what variables underlie the endless variety of different wave patterns. I’d like to know.

Rough water in the Queen Charlotte Channel

A bit of a rough ride in the Queen Charlotte Channel. Notice how the water changes between the first two pictures, shot literally five seconds apart. Evidence of roughness in the third.

Boating in the Pacific Northwest is pleasant in that the topography is mostly fjord-based, which means there aren’t many rocks near enough the surface to trouble your boat. So the major risk is floating logs, which usually won’t sink you but can wreck your propeller in a moment, leaving you calling for an expensive tow. We try to always have two pairs of eyes watching forward. When you spot a log in your path, first you dodge. Then you slow down, because where you see one, you’ll usually encounter more; the flotsam floats in clusters. I’ve never managed to spot any pattern in where the clusters occur, or how they correlate with the time of day or time of year. I’d like to know.

The color of the ocean is never constant, day to day nor hour to hour. The direction of the sun matters I guess, and the nature of the cloud cover. Also perhaps the salinity of the water, and whatever microorganisms are flourishing where you’re looking. Perhaps there are things to be learned from the color about the ecosystem or the weather? I’d like to know.

The north shore of New Zealand

Looking east from Cape Reinga across the north tip of New Zealand.

Sometimes the water’s surface is clean, especially out away from land. Where my office floats, rarely. Varieties of crud accumulate, some identifiable — pollen, for example, which correlates with hay-fever. But sometimes the surface is oily or slimy or bubbly. This is an urban inlet, so likely the city’s effluents play a role here. I bet a biochemist with a decent lab could figure out what it is, most times. I’d like to know.

And the crud isn’t just on the surface. There are days when I can see a few meters deep, getting a clear view of the crabs that somehow manage to make a go of it in this Vancouver-flavored soup. Other days it might as well be a grey-green wall. Sometimes you can see what’s clouding the water because it’s granular, and the grain size varies. Basic combinatorics teaches that you don’t need that many contributing factors to get this variety of looks, but I don’t know what those factors are. I assume there are Marine Biologists who study this stuff and could tell you at a glance what’s going on. I’d like to know.

More than crabs live in False Creek and out in the broader Pacific. On warm summer days by the office, huge schools of tiny fish swarm the water. Their motions, were they starlings, would be called murmuration. An old guy washing his boat told they were herrings. Sometimes, below them, larger silver fish cruise about, all very flOw-like. How do they arrange to move in waves, all together, lightning-fast? When the water is jammed with life on Tuesday and Thursday, where were they all on Wednesday when it was almost empty? What do they eat? How does the species make it from one summer to the next? I’d like to know.

Humpbacks in Haida Gwaii

Humpbacks and gulls chowing down in Haida Gwaii.

At a larger scale, we’ve started seeing humpbacks, lots of seals have always been around, also the occasional killer whale or dolphin pod. At sea, you don’t see the salmon, but we know they’re there in large but sadly diminished numbers. The marine biologists I’ve known shake their heads at how little we know about the subsurface lives of all these creatures, but there’s joy for a scientist in a large unexplored space. What are they doing down there where we can’t see them? I’d like to know.

Everyone knows the circular ripples caused by raindrops or pebbles. When there’s just a drizzle or perhaps a storm stealing in, there’s fascination in watching the raincircles crowd denser and denser. But lots of times, just strolling along the dock, a circle or circle cluster will manifest, looking smaller than those caused by raindrops. I’m assuming that it’s something from below, touching the surface for its own reasons, whatever they are. I’d like to know.

Howe Sound

Also

Humanity is like the ocean. Its surface is patterned and you can feel the patterns as you pass through. Colors matter. There is transparency and occlusion. There is nasty crud, often in clusters. Important things live beneath the surface. Impacts arrive from above and below and it’s not obvious what caused them. I’d like to understand all that better too.

Welcome to the Podcast 24 Aug 2020, 12:00 pm

I was wondering if podcasting is still a thing, so I tweeted “When do people listen to podcasts, now that nobody is commuting? Housework? Exercise?” Newsflash: Yes, it is. The tweet got lots of traction and only very few of the responses said they were listening less. Given that, I decided to make an audio version of what you are reading and call it a podcast. Turns out to be pretty easy.

Details

It’s here on Apple podcasts, which I gather means it will show up on Overcast and other podcast aggregators (um, whatever they are, I’m new to this) soon. If you want to do it yourself, here’s the RSS.

Confession

I rarely listen to podcasts. When I’m doing something like driving that doesn’t fully occupy the mind, I prefer music to talk. When I want to consume information, I find reading is way faster. But I’m not against them or anything, they seem to improve many peoples’ lives.

What it is

For the moment, just a selection from the fragments that appear here — occasionally I publish pieces that are just collections of links, which I think would be podcast-unfriendly. But the majority will be included, assuming the world shows any interest. In the future I might try to go conversational but to be honest at the moment I don’t feel a burning urge. I’d rather find a way to work music in, and have no notion of the issues around that.

Hardware tech

I record these sitting in my boat/office on a 2019 16" MacBook Pro using an unexotic Shure MV5 microphone which I picked because it stood out among well-reviewed mikes for occupying little desk space, and that’s a big advantage on the boat. It connects to the Mac via a CalDigit TS3+ Thunderbolt hub, without which my boat/office setup would be entirely unworkable.

The studio

The studio. The seat’s not ergonomic but the view is good.

I haven’t made any real attempt to supress background noise and if you get lucky you might hear a seagull squawk or a seal splash. When I play back the sound loud, I can hear the background roar of the city. It doesn’t bother me.

Let’s be honest: The main problem with the sound quality is my voice, which is nothing to write home about, and the fact that I stumble over my own words. Several of the eleven fragments present at launch required more than one take; once again, not a big problem.

Also, I notice, upon listening to a sampling of these early episodes, that I should probably go slower. I’m a pretty fast talker but the real issue is that my writing style is dense. This isn’t an accident, I consciously try to compress the prose here; what I publish is usually a lot shorter than the initial draft. Some of the verbal chaff I remove while editing is probably perfectly appropriate for spoken-word.

Software tech

I record with Audacity because that’s what everybody says to use, and export to MP3. Once you’ve got the MP3’s, you need to wrap them in RSS (specifically, RSS 2.0, none of the other flavors) and you’ve got a podcast. There are lots of tutorials out there, and they are considerably annoying in that few to none actually include examples of the raw XML code so that I can cut and paste it, they assume you’re using some sort of templating engine or other voodoo.

The reason I need examples is that the software that runs this blog (and now this podcast too) was entirely hand-built by yours truly, initially in 2002, and subsequently elaborated and fancified and mutated over the decades. In fact, it’s all one Perl file containing 2,880 lines of what I claim is pure software beauty. I will open-source it for the world to enjoy on the twelfth of never.

To-do and what next

I need to have each blog entry that’s also a podcast have a “click here to listen” button. Feel free to point out any other undotted i’s or uncrossed t’s.

Going forward, I dunno. I’m making this up as I go along. If someone has a brilliant idea for what I should do with this global multimedia empire, I’m real easy to find. My mind is open, and I guess my ears should be too, now.

, the fact that... 23 Aug 2020, 12:00 pm

I’ve been reading Ducks, Newburyport by Lucy Ellmann since January of this year and finished it earlier this evening. I’ve taken breaks from it to read other books and quit my job and launch a sideline in activism. The book is over a thousand pages long and mostly composed of a single sentence, an endless flow of phrases many introduced by , the fact that… I enjoyed it a whole lot! While most people won’t be eager to wade into something this big and complicated, the fact that it’s OK to take months and months to wander through it may make the idea less intimidating. I hope to tempt you further. Also a few notes about me and books.

Ducks, Newburyport by Lucy Ellman

This thing was short-listed for the Booker and widely-reviewed; if you want to get a feeling for it you can check out any of the CBC, the London Review of Books, the New Yorker, the New York Times (and again), and The Guardian.

The Washington Post makes the point that reading a thousand-page work is physically difficult, especially if you have any age-related stiffening of the extremities. So if you are going to pick up Ducks, get it on Kindle or Kobo or some such! Your wrists will thank you.

Also from the WashPost, a good interview with the author. The following Q/A resonated with me, given my recent history: Q: “What are you trying to say about how society — and women in particular — view motherhood?” A: “I’m not trying to say anything, I’m saying it!” She’s just invested a thousand freaking pages getting her message across; questions of the “what are you trying to say?” form deserve to be pissed on from a great height. In fact, protip for interviewers: Asking any writer who’s just published something what they were trying to say is apt to get you a well-deserved grumpy response.

What the reviews don’t say

As they do say, most of the book is the somewhat overwrought and disordered internal ramblings of a woman with four children in a happy academic marriage whose family is suffering money troubles following on her cancer, and who has turned from teaching to baking to help pay the bills. Interspersed is a story about a mountain lioness’ search for abducted cubs. But they miss things that I think are important:

  1. First, not much of any significance happens during the first 900 pages; a couple of minor family crises. Not that it’s boring, I never once felt the urge to lay it aside. But during the last 10% of the book there are significant plot outbursts, real dramatic life-and-death action.

  2. This book is about the most American thing imaginable. While our protagonist and her husband have lived overseas, they now live in rural Ohio, and the political and business and musical and literary and shopping and weather and cultural and child-rearing and many many other issues that infuse the story are so American that I suspect anyone not infused in New World culture will miss lots of subtleties.

  3. I talked about “many many” issues. Our narrator’s internal landscape is astonishingly, overwhelmingly, rich. Yes, the same issues and themes keep coming back, but damn are there ever a lot of them. I am old and have drunk pretty deep at the wells of culture and history but was handicapped by being weak on classic cinema and the work of Laura Ingalls Wilder (Little House on the Prairie). But time after time, a side-trip off a side-trip in Ducks would connect to something that I’d cared about after reading or hearing or living it a decade or four ago.

  4. The principle of Chekhov’s gun applies in spades here. You have been warned.

  5. Life is a sexually-transmitted terminal condition. It eventually kills us all and as a matter of course inflicts wounds along the way. Often we survive and enjoy post-recovery life. But never pretend that the damage isn’t permanent. No ending is ever 100% happy.

Big books

I can remember the day I learned to read; sometime in my kindergarten year National Geographic had a story on dinosaurs and my Mom found me hunched over it when she got home from work. After that I could read, and have never stopped.

I like short and medium-sized books. But I especially love big, thickly-flavored books. I suspect my feelings may have a macho component — no book is too big or too hard to defeat me! In fact, Marx’s Capital and Joyce’s Finnegan’s Wake have. But no others I can think of.

Big doesn’t have to be difficult. Ducks, Newburyport isn’t difficult at all; I read this book for pleasure and with pleasure. It’s just big, that’s all.

Yalochat 20 Aug 2020, 12:00 pm

There’s financing news today: Yalochat closed a $15M B Round from B Capital, with follow-on from Sierra Ventures. The reason I’m writing this is that I’ve signed on as an advisor to Yalo; they’re a fun outfit and it’s an interesting story. Also, they’re hiring.

Yalo home page

I’m not going to explain what Yalo does; their website will fill you in on that, and the the TechCrunch coverage of the funding is a useful introduction.

What happened was, a long-time colleague whose judgment I respect reached out and said “I’m helping these guys with product issues but they could really use advice on the engineering side.” So I talked to Javier Mata, the founder/CEO, and he’s a charming and realistic guy, and very convincing about the business’s potential.

Now, their technology is focused on conversational UX and mostly runs on Kubernetes on GCP, so I’m not giving them advice at the which-are-the-best-APIs level. But I’m finding my experience from my own two startups useful, plus the lessons I learned in those years at Google and especially AWS.

They’re cheerful. They’re smart and practical and fun, and they’re solving hard problems. I think they might simplify a lot of people’s lives and make a lot of money while they do it.

While my engagement is just a few hours here and there, my life has taken on a bit of Latin flavor. The company is headquartered in Mexico City, and while they operate primarily in English, there are outbursts of Spanish. I needed a new browser environment to hold my Yalo interactions and reached for Microsoft Edge which, as a side-effect of Yalo onboarding, is now operating in Spanish. Which I studied for a few years back when I was in high school and dinosaurs walked the earth. Long-neglected sectors at the back of my brain re-activated and I’m actually enjoying it when I open an email and the two first fields are Para and Asunto.

In these Covid times, everyone’s retreated to their hometown, so I find myself in videoconferences with Guadalajara, Mexico City, Guatemala City, and Rio Dulce. The calls are clear and sharp. It’s pleasing that the Internet access is becoming solid in more and more parts of the world.

Cloud Traffic 9 Aug 2020, 12:00 pm

I recently watched Build an enterprise-grade service mesh with Traffic Director, featuring Stewart Reichling and Kelsey Hightower of GCP, and of course Google Cloud’s Traffic Director. Coming at this with a brain steeped in 5½ years of AWS technology and culture was surprising in ways that seem worth sharing.

Stewart presents the problem of a retail app’s shopping-cart checkout code. Obviously, first you need to call a payment service. However it’s implemented, this needs to be a synchronous call because you’re not going to start any fulfillment work until you know the payment is OK.

If you’re a big-league operation, your payment processing needs to scale and is quite likely an external service you call out to. Which raises the questions of how you deploy and scale it, and how clients find it. Since this is GCP, both Kubernetes and a service mesh are assumed. I’m not going to explain “service mesh” here; if you need to know go and web-search some combination of Envoy and Istio and Linkerd.

The first thing that surprised me was Stewart talking about the difficulty of scaling the payment service’s load balancer, and it being yet another thing in the service to configure, bearing in mind that you need health checks, and might need to load-balance multiple services. Fair enough, I guess. Their solution was a client-local load balancer, embedded in sidecar code in the service mesh. Wow… in such an environment, everything I think I know about load-balancing issues is probably wrong. There seemed to be an implicit claim that client-side load balancing is a win, but I couldn’t quite parse the argument. Counterintuitive! Need to dig into this.

And the AWS voice in the back of my head is saying “Why don’t you put your payments service behind API Gateway? Or ALB? Or maybe even make direct calls out to a Lambda function? (Or, obviously, their GCP equivalents.) They come with load-balancing and monitoring and error reporting built-in. And anyhow, you’re probably going to need application-level canaries, whichever way you go.” I worry a little bit about hiding the places where the networking happens, just like I worry about ORM hiding the SQL. Because you can’t ignore either networking or SQL.

Google Traffic Director

Traffic Director

It’s an interesting beast. It turns out that there’s a set of APIs called “xDS”, originally from Envoy, nicely introduced in The universal data plane API. They manage the kinds of things a sidecar provides: Endpoint discovery and routing, health checks, secrets, listeners. What Google has done is arrange for gRPC to support xDS for configuration, and it seems Traffic Director can configure and deploy your services using a combination of K8s with a service mesh, gRPC, and even on-prem stuff; plus pretty well anything that supports xDS. Which apparently includes Google Cloud Run.

It does a lot of useful things. Things that are useful, at least, in the world where you build your distributed app by turning potentially any arbitrary API call into a proxied load-balanced monitored logged service, via the Service Mesh.

Is this a good thing? Sometimes, I guess, otherwise people wouldn’t be putting all this work into tooling and facilitation. When would you choose this approach to wiring services together, as opposed to consciously building more or less everything as a service with an endpoint, in the AWS style? I don’t know. Hypothesis: You do this when you’re already bought-in to Kubernetes, because in that context service mesh is the native integration idiom.

I was particularly impressed by how you could set up “global” routing, which means load balancing against resources that run in multiple Google regions (which don’t mean the same things as AWS regions or Azure regions). AWS would encourage you to use multiple AZ’s to achieve this effect.

Also there’s a lot of support for automated-deployment operations, and I don’t know if they extend the current GCP state of the art, but they looked decent.

Finally, I once again taken aback when Stewart pointed out that with Traffic Directors, you don’t have to screw around with iptables to get things working. I had no idea that was something people still had to do; if this makes that go away, that’s gotta be a good thing.

Kelsey makes it go

Kelsey Hightower takes 14 of the video’s 47 minutes to show how you can deploy a simple demo app on your laptop then, with the help of Traffic Director, on various combinations of virts and K8s resources and then Google Cloud Run. It’s impressive, but as with most K8s demos, assumes that you’ve everything up and running and configured because if you didn’t it’d take a galaxy-brain expert like Kelsey a couple of hours (probably?) to pull that together and someone like me who’s mostly a K8s noob, who knows, but days probably.

I dunno, I’m in a minority here but damn, is that stuff ever complicated. The number of moving parts you have to have configured just right to get “Hello world” happening is really super intimidating.

But bear in mind it’s perfectly possible that someone coming into AWS for the first time would find the configuration work there equally scary. To do something like this on on AWS you’d spend (I think) less time doing the service configuration, but then you’d have to get all the IAM roles and permissions wired up so that anything could talk to anything, which can get hairy fast. I noticed the GCP preso entirely omitted access-control issues. So, all in, I don’t have evidence to claim “Wow, this would be simpler on AWS!” — just that the number of knobs and dials was intimidating.

One thing made me gasp then laugh. Kelsey said “for the next step, you just have to put this in your Go imports, you don’t have to use it or anything:

_ "google.golang.org/xds"

I was all “WTF how can that do anything?” but then a few minutes later he started wiring endpoint URIs into config files that began with xdi: and oh, of course. Still, is there, a bit of a code smell happening or is that just me?

Anyhow

If I were already doing a bunch of service-mesh stuff, I think that Traffic Director might meet some needs of today and could become really valuable when my app started getting heterogeneous and needed to talk to various sorts of things that aren’t in the same service mesh.

What I missed

Stewart’s narrative stopped after the payment, and I’d been waiting for the fulfillment part of the puzzle, because for that, synchronous APIs quite likely aren’t what you want, event-driven and message-based asynchronous infrastructure would come into play. Which of course what I spent a lot of time working on recently. I wonder how that fits into the K8s/service-mesh landscape?

Long Links 4 Aug 2020, 12:00 pm

Back in early July I posted ten links to long-form pieces that I’d had a chance to enjoy because of not having one of those nasty “full-time-job” things. I see that the browser tabs are bulking up again, so here we go. Just like last time, people with anything resembling a “life” probably don’t have time for all of them, but if a few pick a juicy-looking essay to enjoy, that’ll have made it worthwhile.

The Truth Is Paywalled But The Lies Are Free by Nathan J. Robinson. I think we’ve all sort of realized that the assertion in the title is true. Robinson doesn’t lay out much in the way of solutions but does a great job of highlighting the problem. I dug into this this myself last year in Subscription Friction. It’s a super-important subject and needs more study.

Like many, I’ve picked up a new quarantine-time activity — in my case, Afro-Cuban drumming lessons. I’ve been studying West-African drumming for many years, but this is a different thing. Afro-Cuban is normally done on a pair of conga drums as opposed to djembé and dunum. At the center of the thing is the almighty Clave rhythm. In The Rhythm that Conquered the World: What Makes a “Good” Rhythm Good? Godfried Toussaint of Harvard dives deep on the “clave son” flavor (which is sort of the Bo Diddly beat with the measures flipped, and with variations) and has fun with math and music! I personally find the “rumba clave” variation just absolutely bewitching, when you can get it right. For more on this infinitely-deep rabbit hole, check out Wikipedia on Songo and Tumbao.

There’s a new insult being flung about in the Left’s internecine polemics: “Class Reductionist”. A class reductionist is one who might argue that while oppression by race and gender and so on are real and must be struggled against, the most important thing to work on right now is arranging that everyone has enough money to live in dignity. There is some conflict with “intersectional” thinking, which argues that the the struggles against poverty and sexism and racism and LGBTQ oppression aren’t distinct, they’re just one struggle. In some respects, though, the two are in harmony: A Black trans woman is way more likely on average, to be broke. In How calling someone a "class reductionist" became a lefty insult, Asad Haider takes on the subject in detail and at length and says what seem to me a lot of really smart things. Confession: I’m sort of a class reductionist.

In The New Yorker, Why America Feels Like a Post-Soviet State offers that rare thing, a new perspective on 2020’s troubles. By Masha Gessen, who knows whereof they speak.

As anti-monopoly energy starts to surge politically around the world, it’s important to drill down on specifics. It’s gotten to the point where I hesitate to point fingers at Amazon because I actually think the Google and Facebook monopolies need more urgent attention, but since details matter, here’s Amazon’s Monopoly Tollbooth, which tries to assemble facts and figures on whether and how the Amazon retail operation has a monopoly smell. And also on a related but distinct subject: The Harmful Impact of Audible Exclusive Audiobooks. The latter is open to criticism as being the whining of a losing competitor. But when you start to hear that a lot, you may be hearing evidence of monopoly.

In “Hurting People At Scale”, Ryan Mac and Craig Silverman dig hard into Facebook employee culture and controversy. I think this is important because the arguments Facebookers are having with each other are ones we need to be having at a wider scale in society.

Related: A possibly new idea: “Technology unions” could be unions of conscience for Big Tech, by Martin Skladany. It’s pretty simple: High-tech knowledge workers are well-paid and well-cared-for and hardly need traditional unions for traditional reasons. But they’re progressive, angry, and would like to work together. Maybe a new kind of “union”?

Most people reading this will know that I’m an environmentalist generally, a radical on the Climate Emergency specifically, and particularly irritated about the TransMountain (“TMX”) pipeline they’re trying to run through my hometown to ship some of the world’s dirtiest fossil fuels and facilitate exhausting their (high) carbon load into the atmosphere. Thus I’m cheered to read about any and all obstacles in TransMountain’s way, most recently having trouble finding insurers.

Also on the energy file, many have probably heard something about the arrest of the Ohio House of Representatives Speaker Larry Householder on corruption charges. An FBI investigation shows Ohio’s abysmal energy law was fueled by corruption, by Leah C. Stokes, has the goods, big-time. This is corruption on such a huge blatant scale that it feels cartoonish, with a highly-directed purpose: Directing public money into stinky bailouts of stinky dirty-energy companies. When the time has come that you have to bribe the governments of entire U.S. States to keep the traditional energy ecosystem ticking over, I’d say the take-away is obvious.

Finally, some good ol’ fashioned leftist theory. Recently, a few people out on the right (e.g. Tucker Carlson) have been saying things that, while they retain that Trumpkin stench, flame away at inequality and monopoly in a way that seems sort of, well, left-wing. It sounds implausible that Fox-head thinking could ever wear the “progressive” label, thus Mike Konczal’s The Populist Right Will Fail to Help Workers or Outflank the Left (Tl;dr: “Pfui!”) is useful.

Not an Amazon Problem 23 Jul 2020, 12:00 pm

The NY Times profile quotes me saying “We don’t really have an Amazon problem. What we have is a deep, societal problem with an unacceptable imbalance of power and wealth.” But the URL contained the string “amazon-critic-tim-bray” and the HTML <title> says “Tim Bray Is Not Done With Amazon”. I feel like I’m being pigeonholed and I don’t like it.

I am totally not some sort of anti-Amazon obsessive. And I am done with Amazon; even though I’m doubtless on a dozen internal enemies lists, I have no real interest in giving the company, specifically, a hard time. I’m mad at the structure of 21st-century capitalism; the fabric of society is in danger of breaking. I’m enjoying the privilege of having an audience for my criticism and can’t see any reason to limit it to one corporation. Because the problem is so big that any of them, even Amazon, is rounding error.

There’s also the fact that I admire many things about Amazon:

  1. It’s by far the best-managed place I ever worked, including the places where I was the CEO.

  2. The people I worked with in AWS were, on average, decent, honest, and smart.

  3. The climate pledge is admirable. (Haven’t seen much action recently, and I guess Covid is a not-terrible excuse, but come on, the climate emergency’s not going away.)

  4. The company is working hard on Diversity & Inclusion. The results are meh (like all over the tech biz) but the energy and funding are there.

  5. Amazon.com has improved the world by showing what well-done online shopping is like. The world likes it. It remains to be seen what the world is like when there are a lot more parties offering really great online shopping.

  6. AWS has improved the world of IT by showing what well-done cloud computing is like. The IT world likes it. Public-cloud competition is stiffening, but AWS is a really good choice for almost anyone’s needs. Its lead over the competition is well-earned. I’m doing a bit of consulting here and there and have found myself recommending AWS more than once already.

Except for that thing

The firing-whistleblowers thing I mean. They shouldn’t have done that and there’s no excuse. I still think quitting was the only thing to do. But I feel no need (nor, apparently, does Amazon) to relitigate that issue.

Well, and one other thing: Amazon’s absurd and cruel down-to-the-last-ditch-and-beyond resistance to unionization. There’s no way that’s reasonable. I’ve gotten to know some fine, inspiring people working that front and I’ll support them going forward, any way I can.

US family wealth distribution

The rest of the things

There’s lots more to complain about but little of it is specific to Amazon, it’s all about 21st-century-capitalism, like so:

  1. Warehouse workers’ lives are often shitty. In fact, life for the US working class these days is shitty, and it’s not by accident, it’s by design. It was called the Reagan-Thatcher neoliberal consensus, and it was wrapped up with lots of high-flying rhetoric about this freedom and that dynamism and those flexibilities, but you don’t have to be that cynical to see it as good old-fashioned class war. It’s obvious who’s winning.

  2. Big Tech sells technology to loathsome customers, for example carbon-vomiting oil extractors and overempowered child-caging immigration officials. Of course, possibly the bug is that these sorts of organizations are allowed to have oceans of money. Fix that and remove the temptation!

  3. Big Tech emits a whole lot of carbon into the atmosphere and facilitates the emission of much, much more. Humanity can’t afford this and it has to stop.

  4. Big Business in general and Big Tech in particular is waaaay too powerful; the millions poured into lobbying have fabulous ROI.

Break ’em Up

I’m strongly convinced that the structure of Big Business in general and Big Tech in particular is too concentrated in ways that are damaging along multiple axes. Check out Anti-Monopoly Thinking and Large Companies Considered Harmful; this is not a new opinion.

Let’s start with Google. It feels the most urgent because the Google/Facebook ad cartel is rapidly destroying publishing business models that have been essential to civilized human dicourse.

Next I’d go after Microsoft who are still, all these decades later, using the cash and incumbency advantages of their steely grip on the business desktop and email server to invade other business sectors. Which is to say, Slack has a gripe. This one is particularly maddening to me because Microsoft has been doing this shit since I was carving code on stone tablets.

By the time you’d launched those campaigns I suspect Amazon would wake up, smell the coffee, and spin out AWS already. Then we’ll see what the retail business is really like given the violent contrast between the galactic revenue growth and the negligible profit margin.

Media Friction

I’ve had a certain amount of friction with the journalists who found me interesting after my May 1st exit from Amazon. Because there was this great, simple, story that they’d like to have told: Plucky engineer leads geek revolt against Jeff Bezos, Richest Man In the World.

That struggle story — Man against company, or Man against Billionaire — is a crowd-pleaser. The actual struggle that interests me is against the current horrifying imbalance in global power and wealth, which is kind of abstract, doesn’t have a chiseled cartoon-villain billionaire in the cast, and is frighteningly large in scale.

Seriously; basically every reporter I’ve talked to has tried to get me to say awful things about Amazon and in particular about Jeff Bezos. But at my last job they taught me to think big and, with all his billions, Jeff is rounding error in the big picture. He’s not the problem; the legal/regulatory power structures that enable him and his peers is.

Amazon is a perfectly OK company, to the extent that planetary-scale sprawling corporate behemoths can be perfectly OK in 2020. Which is to say, not OK at all.

But once again, no one company is the problem. The problem’s the entirely-fixed great global card game illustrated by that graph, above. The problem’s the fact that in one of the world’s wealthiest economies, we have ever-growing homeless camps where they’re dying of opioids faster than Covid. (That’s maybe not the worst part, but it’s the one I see with my own eyes.)

We can do way better as a society than the greed-fueled planet-destroying worker-crushing hamster-wheel we’re spinning right now.

That’s why I’m making noise.

Polishing Lawrence 21 Jul 2020, 12:00 pm

I edit Wikipedia as a hobby, and recommend that hobby. (Fifteen or so years ago I was an early defender in the days when many scoffed at the Wikipedia idea.) I spent much of this past weekend on a long-running editing project, the entry for T.E. Lawrence, better known as Lawrence of Arabia. When I finished up on Monday I felt like a milestone had been passed so I decided to share, because the story behind it seemed worth telling.

If you want the facts about Lawrence, go read the entry, it’s not that long. Briefly: An archeologist, he served in WWI as liaison between British forces in the Middle East and the “Arab revolt”, both of which were fighting the Ottoman Turks, allies of Germany. After the war, he became household-name famous, renounced that fame to serve as an enlisted man in the RAF, wrote a couple of books (Seven Pillars of Wisdom the best-known) and a huge volume of letters (fun to read) and died aged 46 in a motorcycle accident. His fame increased posthumously because of David Lean’s film Lawrence of Arabia, you’ve all seen it.

Editing this entry has consumed probably a few weeks of my life in aggregate, and I’m content with that. It involved one edit war with a crazed Islamist which required that I drive to the nearest university research library and consult obscure journals that nobody should ever have to look at. It has also turned me, I claim, into the world’s greatest living expert on Lawrence’s sexuality.

Library books about T.E. Lawrence

Meta-Lawrence

The entry has long had a reasonable description of the significant things Lawrence did. A related but distinct subject is the truth or falsity of his public image.

Immediately after the Great War ended, Lowell Thomas, an American impresario, toured with a multimedia presentation entitled With Allenby in Palestine and Lawrence in Arabia, including film, slides, dancing girls, and Thomas’ own narration. It was wildly successful and turned Lawrence into perhaps Britain’s most famous war hero of the time.

Two biographies were written during Lawrence’s lifetime, by B. H. Liddell Hart and Robert Graves. They were very rah-rah, attributing nearly godlike powers to their subject.

Then, in the early 1950s Richard Aldington, a well-regarded poet, novelist, and biographer, signed up to produce another tour of Lawrence’s life. His research convinced him that Lawrence was illegitimate, a liar, a sexual pervert, a self-promoter, and a bad writer, whose military contributions were negligible and misguided. Among other things, he discovered that Lawrence had significant input into the Graves and Liddell Hart biographies, apparently having written some sections himself.

Aldington’s draft biography, which said this at length, was leaked before publication. Lawrence’s surviving friends and admirers, horrified, sallied forth ready to rumble; the ensuing kerfuffle is quite a story. Liddell Hart led the “Lawrence Bureau” in an attempt to have the book suppressed. When that failed, he circulated hundreds of copies of an anti-Aldington tract to all the likely reviewers. This worked pretty well; when Lawrence of Arabia: A Biographical Enquiry was published in 1955, the reviews were savage; some showed evidence of having read Liddell Hart’s rebuttal but not Aldington’s book.

To be fair, the book was mean-spirited and full of anger, what today we’d call “trolling”. It also revealed that Aldington was a believer in the virtues of European colonial rule over brown people; exactly what Lawrence’s attempts to establish post-War Arab freedom were designed to thwart. Furthermore, Aldington (who lived in France) was also enraged by what he saw as Lawrence’s Francophobia — in fact, Lawrence’s efforts to block France’s colonial dreams were a function of his desire to see a free Arab state arise in Syria.

Books about T.E. Lawrence

My sincere thanks to the Vancouver Public Library system
for the use of these.

Prejudice aside, Aldington did discover and publish plenty of evidence of Lawrence having embellished the narrative of his life, and likely even invented some of it from whole cloth. He was particularly guilty of inflating numbers: Of bridges he’d blown up, of wounds he’d suffered, of the price on his head, and of the number of books he’d read. Subsequent biographers, most of whom remained admirers of Lawrence (as do I), freely acknowledge the truth of most of Aldington’s claims.

The whole thing is a hell of a story, best told by one Professor Fred D. Crawford in Richard Aldington and Lawrence of Arabia: A Cautionary Tale. It’s not exactly a bestseller; you’ll probably need to hit a big library to find a copy.

The production of Aldington’s book was a financial and emotional disaster for him from which he never fully recovered. Granted, he was clearly a “difficult person”, but still, one feels he deserved better.

Milestone?

Up until last week, the Wikipedia entry, while reasonably good on Lawrence’s military, political, and writing activities, had entirely ignored the Aldington controversy and ensuing revolution in Lawrence studies. No longer. Having whittled away at this thing for over a decade I think the entry is, broadly speaking, complete. Obviously no Wikipedia entry is ever finished because the course of human knowledge advances and better references can always turn up for the facts as outlined. But I still felt good when I posted the new section, and better still when, the next morning, it had picked up a couple of friendly edits and apparently failed to provoke any editorial opposition.

At this point I should emphasize that I don’t get to make the completeness call because I don’t “own” the Lawrence article; nobody owns anything in Wikipedia. Nor do I “lead” the editing in any meaningful sense. Many others have done lots of solid work over the years.

Join in!

This isn’t how I’d recommend everyone spend their weekends, but I find it rewarding sometimes. Who knows, you might too! Because everybody —everybody — is a world-class expert in something, be it only their own hometown, neighborhood, or hobby. Why not share that knowledge?

Wikipedia one of the few places on the Internet where truth is pursued, intently and unironically, for its own sake. It’s not perfect because nothing is, but everyone has the opportunity to make it more so.

Safari Tabs Are Weird 17 Jul 2020, 12:00 pm

Recently I switched from Chrome to Safari because on a 2019 16" MacBook Pro, Chrome has nasty video glitches; apparently Apple and Google are blaming each other for the problem. My first impression of Safari is decent, subjectively it feels faster than Chrome. But browser tabs act in ways that that feel somewhere between “weird” and “badly broken”. Here are a few of the issues I’ve encountered.

[Update: Lots of free wisdom on offer in the comments here, including some that improved my life.]

Tab display

On Chrome, if you have a lot of tabs — I typically have 20 or 30 open — they get narrower and narrower and until completely crushed out of existence show a little glimpse of icon and label, so you can find what you’re looking for.

On Safari, the tabs near the one you’re on are shown pretty well full-width with quite a bit of white-space surrounding labels and icons (where by “white ” I mean “grey”). You can two-finger drag the tab label array right and left and they shuffle around in a graceful way.

I don’t like this because I lost all-my-tabs-at-a-glance. If you have hundreds open I guess that wouldn’t work anyhow.

But I do like it because when I’m looking for a tab that’s not obvious to the eye, it’s easier to find. Hmm, I’ll call this a saw-off.

Tab pinning

All the browsers support tab pinning these days, which is great because it enables the Mighty Tab Trick, where you park the eight tabs you use all the time and insta-hop to them using CMD-1 through CMD-8.

But Safari pins tabs hard. If you follow a link off the site you’re at, new tab. If you go somewhere, pin it, then press Back, new tab. You can’t nuke the tab with CMD-W. I mean it’s not pinned, it’s nailed the fuck down.

This can lead to weirdly buggy behavior, which this blog provokes. If you’re reading this in a web browser, there’s a search field up the top right which does a dumb simple Google search of the blog. Now, I keep ongoing in a pinned tab for obvious reasons. If I a do a search, the blog is replaced by a Google search output page. Fine. But that page now exhibits hard-pinning. If I click a result, that opens in a new tab. If I press back, it opens my blog… in a new tab. In fact there is absolutely no way back to the blog from the search result page. I think I’d call this a bug?

Left and right

[Update: Check the comments for better options. I think I withdraw this complaint.]

You can move from any tab to the next one on the left or right. The key combo is CMD-Shift-left/right, annoyingly different from other browsers. But a lot of the time it just doesn’t work. If you’re in a form, or a text-edit field, or really any mode that you might be able to interact with what’s in the tab, you can’t step left or right.

I often put a couple of tabs next to each deliberately, like when I’m editing a Wiki page and I have supporting evidence I need to copy from, if it’s in the next tab it’s easy to flip back and forth. But impossible on Safari because if you can edit, you can’t flip.

This one really feels like a bug. Or am I weird?

Where do new tabs go?

[Update: “Debug” menu suggestion below: Verified; note that you have to click several things in that sub-sub-menu.]

I mean tabs that are opened because the link requested it, like from a search-results page or, as described above, a pinned tab, not because you hit CMD-T to ask for one. Safari is consistent with Chrome on this one, which is a pity, because they’re both wrong, they both open the new tab as close as possible to where you are. Which, if in you’re in the middle of a bunch of pinned tabs, is not what you’d expect.

I’d actually like it if all new tabs opened over at the right end of the row. At least as a settings option. Because then you’re tabs would naturally arrive at a structure of the oldest ones being at the left and the newest ones to the right. I can see room for argument about this, but I suspect the current policy was arrived at before a lot of people had many pinned tabs, because that’s where the behavior becomes confusing.

Did I miss any?

I wish everything worked the same.

Long Links 7 Jul 2020, 12:00 pm

Having recently quit my job, I have more spare time than I used to. A surprising amount of it has been dedicated to reading longer-form articles, mostly about politics and society, but only mostly. I miss my job but I sure have enjoyed the chance to stretch out my mind in new directions. There are plenty of things in the world that need more than a thousand words to talk about. Anyhow, here is a set of lightly-annotated links that people who still have jobs almost certainly won’t have time to read all of. But maybe one or two will add flavor to your life.

Yavne: A Jewish Case for Equality in Israel-Palestine, by Peter Beinart, assumes the death of the two-state option for Israel/Palestine, and takes its time arguing for the only plausible long-term alternative: Some sort of unitary state in which the citizens are equal whatever their religious heritage, be that state multinational, federal, or contonal. Deep, important, stuff.

Tuning for beginners and (especially) Extra stuff on tunings by, uh, I’m not sure who actually, delivers a horribly-typeset crash course on the mathematics of music. Pretty fun for the (quite a lot of) people who are interested by both.

Trivium is the link-blog of Leah Neukirchen, “Just another random shark-hugging girl”, which gleefully pokes around the deepest, darkest, dustiest corners of software tech.

Battery energy storage is getting cheaper, but how much deployment is too much? by Herman K. Trabish, is an exemplar of my growing fascination with energy economics. In particular storage. The path to an all-reenewables energy ecosystem is pretty straightforwardly visible, except for storage infrastructure. This helps.

In the Covid-19 Economy, You Can Have a Kid or a Job. You Can’t Have Both. Ouch; maybe the most important thing written about the current plague that doesn’t actually instruct on how to save myriads of lives. Covid is shining a harsh, harsh light on lots of things we’ve been ignoring but shouldn’t have.

Richard Rorty’s prescient warnings for the American left. Sean Illing in Vox walks through Rorty’s taxonomy of American progressivism. There’s lots for almost anyone to disagree with here, but the Left needs to think about how it thinks about culture and class and, you know, get its freaking story straight. I think this might be helpful.

What About the Rotten Culture of the Rich? is by Chris Arnade, the guy who wrote the book about the American underclass, as interviewed in the Macdonaldses of the flyover zones. He’s hard to classify politically, and the points here about the moral bankruptcy, and damaging effects, of ruling-class behavior, have been made elsewhere. But they’re made very well here.

End the Globalization Gravy Train is by J.D. Vance. I’m pretty far out left but have mental space for intelligent conservatism (see here), and this is that. I was impressed.

When a tyranny falls, maybe justice will be applied to its collaborators. In History Will Judge the Complicit, Anne Applebaum takes a very indirect route to considering how this might apply after November 2020.

It was gold, by Patricia Lockwood, is about Joan Didion. You might not know who Ms Didion is and still enjoy this, because Ms Lockwood is a fabulous writer; certain sentences should be framed in gold and hung in major galleries and museums. She had a book in one of the recent Booker long or short lists, which is on my must-read list.

Just Too Efficient 5 Jul 2020, 12:00 pm

On a Spring 2019 walk in Beijing I saw two street sweepers at a sunny corner. They were beat-up looking and grizzled but probably younger than me. They’d paused work to smoke and talk. One told a story; the other’s eyes widened and then he laughed so hard he had to bend over, leaning on his broom. I suspect their jobs and pay were lousy and their lives constrained in ways I can’t imagine. But they had time to smoke a cigarette and crack a joke. You know what that’s called? Waste, inefficiency, a suboptimal outcome. Some of the brightest minds in our economy are earnestly engaged in stamping it out. They’re winning, but everyone’s losing.

I’ve felt this for years, and there’s plenty of evidence:
Item: Every successful little store with a personality morphs into a chain because that’s more efficient. The personality becomes part of the brand and thus rote.
Item: I go to a deli fifteen minutes away to buy bacon, rashers cut from the slab while I wait, because they’re better. Except when I can’t, in which case I buy a waterlogged plastic-encased product at the supermarket; no standing or waiting! It’s obvious which is more efficient.
Item: I’ve learned, when I have a problem with a tech vendor, to seek out the online-chat help service; there’s annoying latency between question and answers as the service rep multiplexes me in with lots of other people’s problems, but at least the dialog starts without endless minutes on hold; a really super-efficient process.
Item: Speaking of which, it seems that when you have a problem with a business, the process for solving it each year becomes more and more complex and opaque and irritating and (for the business) efficient.
Item, item, item; as the world grows more efficient it grows less flavorful and less human. Because the more efficient you are, the less humans you need.

The end-game

Efficiency, taken to the max, can get very dark.

I suggest investing a few minutes in reading Behind the Smiles by Will Evans. Summary: Certain (not all) Amazon warehouses seem to have per-employee injury rates that are significantly higher than the industry average, as in twice as high or more. Apparent reason: It’s not they’re actually dangerous places to work, it’s just that they’ve maximized efficiency and reduced waste to the point where people are picking and packing and shipping every minute they’re working, never stopping. And a certain proportion of human bodies simply can’t manage that. They break down under pressure.

Robots matter, but not in the way you might think. The idea was that robotized warehouses should reduce stress and strain because they bring the pick-and-pack to the employees, rather than the people having to walk around to where the items are. But apparently robots correlate with higher injury rates. Behind the Smiles quotes employee Jonathan Meador: “‘Before robots, it was still tough, but it was manageable,’ he said. Afterward, ‘we were in a fight that we just can’t win.’”

It’s important to realize that Amazon isn’t violating any rules, nor even (on the surface) societal norms. Waste is bad, efficiency is good, right? They’re doing what’s taught in every business school; maximizing efficiency is one of the greatest gifts of the free market. Amazon is really extremely good at it.

And it’s good, until it isn’t any more.

Efficiency and weakness

Let’s hand the mike over to Bruce Schneier. In The Security Value of Inefficiency he makes one of those points that isn’t obvious until you hear it. Quoting briefly:

“All of the overcapacity that has been squeezed out of our healthcare system; we now wish we had it. All of the redundancy in our food production that has been consolidated away; we want that, too. We need our old, local supply chains — not the single global ones that are so fragile in this crisis. And we want our local restaurants and businesses to survive, not just the national chains.”

Bruce is pointing out that overoptimizing efficiency doesn’t just burn people out, it also too often requires cutting into what you later realize were prudent safety margins.

How hard should people work?

Today, we assume the forty-hour week without thanking the generations of socialists and unionists in the Eight-hour-day movement, whose struggle started around 1817 and didn’t bear global fruit until the middle of the twentieth century.

But there’s nothing axiomatic about forty hours. Twenty years ago, France introduced a 35-hour workweek. Their economy still functions. And John Maynard Keynes, approximately the most influential economist in the history of the world, predicted his grandchildren would enjoy a 15-hour workweek. It seems he was wrong. But maybe only partly.

And of course Keynes himself worked like a madman. As did I, for most of my career. Because some jobs are just jobs, but others are vocations; people doing what they love, and who’d really rather be working than not. Nothing wrong with that.

Some ideologists of Capitalism think that every business should try to make every job a vocation, that people should be delighted with their work, with the benefit (for the capitalist) that you don’t have to hire that many. One famous example of this thinking is at UPS, the delivery company, whose leaders wanted the delivery people to “bleed brown”. Here’s an interesting take on the UPS story, in which the “bleeding brown” notion didn’t catch on.

And while there’s nothing wrong with vocations — I’m lucky and blessed to have found one — most jobs are just jobs. Whether it be a job or a vocation, work should at least leave time for a smoke break in the sun at the corner (or its 21st-century equivalent). And it’s perfectly possible that Keynes’ prediction could come true, in certain future economic configurations.

But, wealth!

If we all work less, we’ll be poorer, right? Because the total cash output of the economy is a (weird, nonlinear) function of the amount of work that gets put in.

That sounds like it should be important, until you ask basic questions like “how much money is there, and who has it?” The answers, pretty clearly, are “Too much” and “An inefficiently small number of very wealthy people.” Business Insider has a nice take on the problem, highlighting the evidence for and consequences of there being just too much money around.

In practice, interest rates stay low, governments can borrow more or less for free, and all sorts of crazily doomed shit is getting investment funding. There is really no evidence anywhere of a global shortage of money, and plenty for the existence of a surplus.

Stop and think

Specifically, do it when something is annoying you; at work, or in your personal interaction with a big organization. Could this be explained by someone, somewhere, trying to be more efficient? In my life, the answer is almost always “yes”.

Cracks

“There is a crack in everything, that’s how the light gets in” sang Leonard Cohen. And there need to be cracks in the surface of work, in the broader organizational fabric that operates the world. Because that’s where the humanity happens. You can be like the people who optimize warehouses and call the gaps “waste”. But that path, followed far enough, leads to a world that we really don’t want to be living in.

It’s hard to think of a position more radical than being “against efficiency”. And I’m not. Efficiency is a good, and like most good things, has to be bought somehow, and paid for. There is a point where the price is too high, and we’ve passed it.

More Topfew Fun 1 Jul 2020, 12:00 pm

Back in May I wrote a little command-line utility called Topfew (GitHub). It was fun to write, and faster than the shell incantation it replaced. Dirkjan Ochtman dropped in a comment noting that he’d written Topfew along the same lines but in in Rust (GitHub) and that it was 2.86 times faster; at GitHub, the README now says that with a few optimizations it’s now 6.7x faster. I found this puzzling and annoying so I did some optimization too, encountering surprises along the way. You already know whether you’re the kind of person who wants to read the rest of this.

Reminder: What topfew does is replace the sort | uniq -c | sort -rn | head pipeline that you use to do things like find the most popular API call or API caller by plowing through a logfile.

Initial numbers

I ran these tests against a sample of the ongoing Apache access_log, around 2.2GB containing 8,699,657 lines. In each case, I looked at field number 7, which for most entries gives you the URL being fetched. Each number is the average of three runs following a warm-up run. (The variation between runs was very minor.)

ElapsedUserSystemvs Rust
Rust4.964.450.501.0
ls/uniq/sort142.80149.661.2928.80
topfew 1.027.2028.310.545.49

Yep, the Rust code (Surprise #1) measures over five times faster than the Go. In this task, the computation is trivial and the whole thing should be pretty well 100% IO-limited. Granted, the Mac I’m running this on has 32G of RAM so after the first run the code is almost certainly coming straight out of RAM cache.

But still, the data access should dominate the compute. I’ve heard plenty of talk about how fast Rust code runs, so I’d be unsurprised if its user CPU was smaller. But elapsed time!?

What Dirkjan did

I glanced over his code. Gripe: Compared to other modern languages, I find Rust suffers from a significant readability gap, and for me, that’s a big deal. But I understand the appeal of memory-safety and performance.

Anyhow, Dirkjan, with some help from others, had added a few optimizations:

  1. Changed the regex from \s+ to [ \t].

  2. The algorithm has a simple hash-map of keys to occurrence counts. I’d originally done this as just string-to-number. He made it string-to-pointer-to-number, so after you find it in the hash table, you don’t have to update the hash-map, you just increment the pointed-to value.

  3. Broke the file up into segments and read them in separate threads, in an attempt to parallelize the I/O.

I’d already thought of #3 based my intuition that this task is I/O-bound, but not #2, even though it’s obvious.

What I did

In his comment, Dirkjan pointed out correctly that I’d built my binary “naively with ‘go build’”. So I went looking for how to make builds that are super-optimized for production and (Surprise #2) there aren’t any. (There’s a way to turn optimizations off to get more deterministic behavior.) Now that I think about it I guess this is good; make all the optimizations safe and just do them, don’t make the developer ask for them.

(Alternatively, maybe there are techniques for optimizing the build, but some combination of poor documentation or me not being smart enough led to me not finding them.)

Next, I tried switching in the simpler regex, as in Dirkjan’s optimization #1.

ElapsedUserSystemvs Rust
Rust4.964.450.501.0
ls/uniq/sort142.80149.661.2928.80
topfew 1.027.2028.310.545.49
“[ \\t]”25.0326.180.53 5.05

That’s a little better. Hmm, now I’m feeling regex anxiety.

Breakthrough

At this point I fat-fingered one of the runs to select on the first rather than the seventh field and that really made topfew run a lot faster. Which strengthened my growing suspicion that I was spending a lot of my time selecting the field out of the record.

At this point I googled “golang regexp slow” and got lots of hits; there is indeed a widely expressed opinion that Go’s regexp implementation is far from the fastest. Your attitude to this probably depends on your attitude toward using regular expressions at all, particularly in performance-critical code.

So I tossed out the regexp and hand-wrangled a primitive brute-force field finder. Which by the way runs on byte-array slices rather than strings, thus skipping at least one conversion step. The code ain’t pretty but passed the unit tests and look what happened!

ElapsedUserSystemvs Rust
Rust4.964.450.501.0
ls/uniq/sort142.80149.661.2928.80
topfew 1.027.2028.310.545.49
“[ \\t]”25.0326.180.535.05
no regexp4.234.830.660.85

There are a few things to note here. Most obviously (Surprise #3), the Go implementation now has less elapsed time. So yeah, the regexp performance was sufficiently bad to make this process CPU-limited.

Less obviously, in the Rust implementation the user CPU time is less than the elapsed; user+system CPU are almost identical to elapsed, all of which suggests it’s really I/O limited. Whereas in all the different Go permutations, the user CPU exceeds the elapsed. So there’s some concurrency happening in there somewhere even though my code was all single-threaded.

I’m wondering if Go’s sequential file-reading code is doing multi-threaded buffer-shuffling voodoo. It seems highly likely, since that could explain the Go implementation’s smaller elapsed time on what seems like an I/O-limited problem.

[Update: Dirkjan, after reading this, also introduced regex avoidance, but reports that it didn’t appear to speed up the program. Interesting.]

Non-breakthroughs

At this point I was pretty happy, but not ready to give up, so I implemented Dirkjan’s optimization #2 - storing pointers to counts to allow increments without hash-table updates.

ElapsedUserSystemvs Rust
Rust4.964.450.501.0
ls/uniq/sort142.80149.661.2928.80
topfew 1.027.2028.310.545.49
“[ \\t]”25.0326.180.535.05
no regexp4.234.830.660.85
hash pointers4.165.100.650.84

Um well that wasn’t (Surprise #4) what I expected. Avoiding almost nine million hash-table updates had almost no observable effect on latency, while slightly increasing user CPU. At one level, since there’s evidence the code is limited by I/O throughput, the lack of significant change in elapsed time shouldn’t be too surprising. The increase in user CPU is though; possibly it’s just a measurement anomaly?

Well, I thought, if I’m I/O-limited in the filesystem, let’s be a little more modern and instead of reading the file, let’s just pull it into virtual memory with mmap(2). So it should be the VM paging code getting the data, and everyone knows that’s faster, right?

ElapsedUserSystemvs Rust
Rust4.964.450.501.0
ls/uniq/sort142.80149.661.2928.80
topfew 1.027.2028.310.545.49
“[ \\t]”25.0326.180.535.05
no regexp4.234.830.660.85
hash pointers4.165.100.650.84
mmap7.178.330.621.45

So by my math, that’s (Surprise #5) 72% slower. I am actually surprised, because if the paging code just calls through to the filesystem, it ought to be a pretty thin layer and not slow things down that much. I have three hypotheses: First of all, Go’s runtime maybe does some sort of super-intelligent buffer wrangling to make the important special case of sequential filesystem I/O run fast. Second, the Go mmap library I picked more or less at random is not that great. Third, the underlying MacOS mmap(2) implementation might be the choke point. More than one of these could be true.

A future research task is to spin up a muscular EC2 instance and run a few of these comparisons there to see how a more typical server-side Linux box would fare.

Parallel?

Finally, I thought about Dirkjan’s parallel I/O implementation. But I decided not to go there, for two reasons. First of all, I didn’t think it would actually be real-world useful, because it requires having a file to seek around in, and most times I’m running this kind of a pipe I’ve got a grep or something in front of it to pull out the lines I’m interested in. Second, the Rust program that does this is already acting I/O limited, while the Go program that doesn’t is getting there faster. So it seems unlikely there’s much upside.

But hey, Go makes concurrency easy, so I refactored slightly. First, I wrapped a mutex around the hash-table access. Then I put the code that extracts the key and calls the counter in a goroutine, throwing a high-fanout high-TPS concurrency problem at the Go runtime without much prior thought. Here’s what happened.

ElapsedUserSystemvs Rust
Rust4.964.450.501.0
ls/uniq/sort142.80149.661.2928.80
topfew 1.027.2028.310.545.49
“[ \\t]”25.0326.180.535.05
no regexp4.234.830.660.85
hash pointers4.165.100.650.84
goroutines!6.5357.847.681.32

(Note: I turned the mmap off before I turned the goroutines on.)

I’m going to say Surprise #6 at this point even though this was a dumbass brute-force move that I probably wouldn’t consider doing in a normal state of mind. But wow, look at that user time; my Mac claims to be an 8-core machine but I was getting more than 8 times as many CPU-seconds as clock seconds. Uh huh. Pity those poor little mutexes.

I guess a saner approach would be to call runtime.NumCPU() and limit the concurrency to that value, probably by using channels. But once again, given that this smells I/O-limited, the pain and overhead of introducing parallel processing seem unlikely to be cost-effective.

Or, maybe I’m just too lazy.

Lessons

First of all, Rust is not magically and universally faster than Go. Second, using regular expressions can lead to nasty surprises (but we already knew that, didn’t we? DIDN’T WE?!) Third, efficient file handling matters.

Thanks to Dirkjan! I’ll be interested to see if someone else finds a way to push this further forward. And I’ll report back any Linux testing results.

Oboe to Ida 26 Jun 2020, 12:00 pm

What happened was, I was listening to an (at least) fifty-year old LP, a classical collection entitled The Virtuoso Oboe; it’s what the title says. The music was nothing to write home about but led into a maze of twisty little passages that ended with me admiring the life of a glamorous Russian bisexual Jewish ballerina heiress who’s famous for… well, we’ll get to that.

The uncollected Les Brown

Regular readers know that I have a background task — working through an inherited trove of 900 mostly-classical LPs. In that piece I said I was going to blog my way through the collection but I haven’t, which is probably OK. But tonight’s an exception.

Oboe virtuosity

As the picture linked above reveals, the records are in cardboard liquor boxes, and somewhat clustered thematically. The box I’m currently traversing seems to be mostly “weird old shit, mostly bad, mostly scratchy”. It contained some malodorous “easy listening” products of the Fifties and Sixties. (I kept one, 1944-46 recordings by Les Brown’s extremely white jazz band, forgettable except it has Doris Day on vocals and let me tell ya, she sets the place on fire. But I digress.)

The Virtuoso Oboe, Vol. 1

Anyhow, last night’s handful of LPs to wash and dry and try included The Virtuoso Oboe and The Virtuoso Oboe Vol. 4, whose existence suggests the series had legs. It features that famous household name André Lardrot on, well, oboe.

The first record contained music from Cimarosa, Albinoni, Handel, and Haydn and frankly didn’t do anything for me. None of the tunes were that memorable and I didn’t find the oboe-playing compelling. So it’s going off to the Sally Ann.

I questioned whether Vol. 4 was even worth trying, but then I noticed it had a concerto by Antonio Salieri. Who’s got Salieri in their collection, I ask you? Nobody, that’s who! But now I do.

The Virtuoso Oboe, Vol. 4

The concerto is nice; if you told me it was by Mozart I’d believe you. A couple of the melodies are very strong and skip opportunities for convolutions and decorations that W. Amadeus never would have.

On the other side was a Concertino for English Horn by Donizetti. I vaguely remembered that the English Horn (a.k.a. cor anglais) is sort of like an oboe only bigger, and notably neither English nor a horn.

Oboe d’Amore

While I was reading up on that I discovered the Oboe d’amore, which is bigger than an oboe but smaller than a cor anglais. Who could resist learning more about a love oboe? Whereupon I learned that it’s perhaps most famous for leading one of the stanzas in Ravel’s Boléro. At which point I was about to stop, because what more can be said about the Boléro?

Well, lots! I’d somehow internalized it as fact that Ravel hated his most famous work and regretted writing it, but not so. He attended performances and produced a later arrangement for two pianos. He also expressed the perfectly reasonable opinion that once he’d done it, nobody else needed to write any pieces consisting of a single melodic line repeated over and over again for seventeen minutes and successively enhanced by adding more and more instruments playing louder and louder.

Ida Rubinstein

He also thought it should be played at a steady pace and got into a big public pissing match with Toscanini, who wanted to play the piece not only louder and louder but faster and faster.

I was thinking about the Oboe d’Amore and Boléro and speculated that might be a joke to be found along the lines of oBo d’Erek but you’d have to be over fifty to follow the thread and anyhow I was wrong, there isn’t a joke there.

Ida Rubinstein

Wait, there’s more! It turns out that Ravel didn’t write the Boléro just because he felt like it, but because he was paid for it; a commission from one Ida Rubinstein, mentioned at the top: dancer, Russian, heiress, bisexual, etc. The reason I wrote this was to get you to hop over to Wikipedia and read up about her life, which was extraordinary. Having done that, you can actually watch bad silent film of her dancing, someone having supplied a music track. It’s a pity there’s no video of her performing to Boléro.

My work here is done.

Break Up Google 25 Jun 2020, 12:00 pm

It’s easy to say “Break up Big Tech companies!” Depending how politics unfold, the thing might become possible, but figuring out the details will be hard. I spent the last sixteen years of my life working for Big Tech and have educated opinions on the subject. Today: Why and how we should break up Google.

egGolo

Where’s the money?

Google’s published financials are annoyingly opaque. They break out a few segments but (unlike Amazon) only by revenue, there’s nothing about profitability. Still, the distribution is interesting. I collated the percentages back to Q1 2018:

Ads on GoogleAds off GoogleAds on YouTubeCloudOther
2018 Q170.97%14.84%??14.19%
2018 Q271.69%14.77%??13.54%
2018 Q371.73%14.58%??13.69%
2018 Q469.05%14.32%??16.62%
2019 Q170.99%13.81%??14.92%
2019 Q270.36%13.66%??15.98%
2019 Q370.97%13.15%??15.88%
2019 Q459.39%13.10%10.26%5.68%11.57%
2020 Q159.76%12.68%9.76%6.83%10.73%
2020 Q256.05%12.37%10.00%7.89%13.42%

Note that they started breaking out YouTube and Cloud last year; it looks like Cloud was previously hidden in “Other” and YouTube in “Ads on Google”. I wonder what else is hidden in there?

Why break it up?

There are specific problems; I list a few below. But here’s the big one: For many years, the astonishing torrent of money thrown off by Google’s Web-search monopoly has fueled invasions of multiple other segments, enabling Google to bat aside rivals who might have brought better experiences to billions of lives.

  1. Google Apps and Google Maps are both huge presences in the tech economy. Are they paying for themselves, or are they using search advertising revenue as rocket fuel? Nobody outside Google knows.

    In particular, I’m curious about Gmail, which had 1.5B users in 2018. Some of those people see ads, but plenty don’t. So what’s going on there? It can’t be that cheap to run. Where’s the money?!

  2. The maps business, with its reviews and ads, has a built-in monopolization risk that I wrote about in 2017. It needs to be peeled off so we can think about it. We definitely want reviews and ads on maps (“Where’s the nearest walk-in clinic open on Sunday with friendly staff?”), but the potential for destructive corruption is crazy high.

    Used to be, there were multiple competing map producers and some of them were governments. The notion of mapping being a public utility (Perhaps multinational? What a concept!) with competing ad vendors running on it doesn’t sound crazy to me.

  3. The online advertising business has become a Facebook/Google duopoly which is destroying ad-supported publishing and thus intellectually impoverishing the whole population; not to mention putting a lot of fine journalists out of work. The best explanation I’ve ever read of how that works is Data Lords: The Real Story of Big Data, Facebook and the Future of News by Josh Marshall.

  4. The world needs Google Cloud to be viable because it needs more than two public-cloud providers. It’s empirically possible for such a business to make lots of money; AWS does. GCloud needs to be ripped away from its corporate parent, just as AWS does, but for different reasons.

  5. I note the pointed absence of Android in any of the financials. It’s deeply weird that the world’s most popular operating system has costs and revenues that nobody actually knows. Of course, the real reason Android exists is that Google needs mobile advertising not to become an Apple monopoly.

  6. YouTube has become the visual voice of several generations and is too important to leave hidden inside an opaque conglomerate. Is it a money-spinner or strictly a traffic play? Nobody knows. What we do know is that people who try to make a living as YouTubers sure do complain a lot about arbitrary, ham-handed changes of monetization and content policy. Simultaneously, Google’s efforts to avoid promoting creeps and malefactors aren’t working very well.

What to do?

First, spin Advertising off into its own company and then enact aggressive privacy-protection and anti-tracking law. Start by doing away with 100% of third-party cookies, no exceptions. It’d probably be OK to leave the off-site advertising in that company.

But YouTube definitely has to be its own thing; it’s got no real synergy that I can detect with any other Google property.

I’m not even sure Android is a business. Its direct cost is a few buildings full of engineers. Its revenue is (indirectly) mobile ads, plus Play Store commissions, plus Pixel sales plus, uh, well nobody knows what kinds of backroom arrangements are in place with Samsung et al. Absent the mobile ads, I doubt it’s much of a money-maker. Maybe turn it into a foundation funded by a small levy on everyone who ships an Android phone… Other ideas?

What to do with Maps isn’t obvious to me. It’s probably a big-money business (but we don’t know). In combination with the reviews capability it should be a great advertising platform, but the opportunities for corruption are so huge I’m not sure any private business could be trusted to manage them. First step: Force Google to disclose the financials.

I think Google Cloud could probably make a go of it as an indie, if only as a vendor of infrastructure to all the other ex-Google properties. And I think the product is good, although they’re running third in the public-cloud race.

To increase Google Cloud’s chances, throw in the Apps business; Microsoft classifies their equivalent as Cloud and I don’t think that’s crazy. My Microsoft-leaning friends scoff at the G Apps, but they’re just wrong; with competent administration the apps offer a slick, fast, high-productivity office-automation experience.

Finally, as a standalone company, we could hope they’d break Google’s habit of suddenly killing products heavily depended-on by customers. You just can’t do that in the Enterprise space.

The politics

So there should be at least four Google-successor organizations, each with a chance for major success.

I think this would be pretty easy to sell to the public. To start with, what’s left of the world’s press would cheerlead, eager to get out from under the thumb of the Google/Facebook ad cartel. Legions of YouTubers would march in support as well.

Financially, I think Google’s whole is worth less than the sum of its parts. So a breakup might be a win for shareholders. This is a reasonable assumption if only because the fountain of money thrown off by Web-search advertising leaves a lot of room for laziness and mistakes in other sectors of the business.

Also, it’s quite likely the ex-Googles could come out ahead on the leadership front. Larry, Sergey, and the first wave of people they hired made brilliant moves in building Web search and then the advertising business. But the leadership seems to have lost some of that golden touch; fresh blood might really help.

When?

The best time would have been sometime around 2015. The second best…

A-Cloud PR/FAQ 21 Jun 2020, 12:00 pm

I’d like to see AWS split off from the rest of Amazon and I’m pretty sure I’m not alone. So to help that happen, I’ve drafted a PR/FAQ and posted it on GitHub so that it can be improved. People who know what a PR/FAQ is and why this might be helpful can hop on over and critique the doc. For the rest, herewith background and explanation on the what, why, and how.

A project needs a name and so does a company. For the purposes of this draft I’m using “A-Cloud” for both.

Why spin off AWS?

The beating of the antitrust drums is getting pretty loud across a widening swathe of political and economic conversations. The antitrust guns are particularly aimed at Big Tech. Whom I’m not convinced are the most egregious monopolists out there (consider beer, high-speed Internet, and eyeglasses) but they’re maybe the richest and touch more people’s lives more directly.

So Amazon might prefer to spin off AWS proactively, as opposed to under hostile pressure from Washington.

But that’s not the only reason to do it. The cover story in last week’s Economist, Can Amazon keep growing like a youthful startup? spilled plenty of ink on the prospects for an AWS future outside of Amazon. I’ll excerpt one paragraph:

AWS has the resources to defend its market-leading position. But in the cloud wars any handicap could cost it dearly. Its parent may be becoming one such drag. For years being part of Amazon was a huge advantage for AWS, says Heath Terry of Goldman Sachs, a bank. It needed cash from the rest of the group, as well as technology and data. But Mr Bezos’s habit of moving into new industries means that there are now ever more rivals leery of giving their data to it. Potential customers worry that buying services from AWS is tantamount to paying a land-grabber to invade your ranch. Walmart has told its tech suppliers to steer clear of AWS. Boards of firms in industries which Amazon may eye next have directed their it departments “to avoid the use of AWS where possible”, according to Gartner.

The Economist plausibly suggests a valuation of $500B for AWS. Anyhow, the spin-off feels like a complete no-brainer to me.

Now, everyone knows that at Amazon, when you want to drive a serious decision, someone needs to write, polish, and bring forward a six-pager, usually a PR/FAQ. So I started that process.

What’s a PR/FAQ?

It’s a document, six pages plus appendices, that is the most common tool used at Amazon to support making important decisions. There’s no need for me to explain why this is works so well; Brad Porter did a stellar job back in 2015. More recently, Robert Munro went into a little more detail.

On GitHub?

One thing neither of those write-ups really emphasize is that six-pagers in general, and PR/FAQs in particular, are collaborative documents. The ones that matter have input from many people and, by the time you get to the Big Read in the Big Room with the Big Boss, have had a whole lot of revisions. That’s why I put this one on GitHub.

I feel reasonably competent at this, because there are now several successful AWS services in production where I was an initial author or significant contributor to the PR/FAQ. But I know two things: First, there are lots of people out there who are more accomplished than me. A lot of them work at Amazon and have “Product Manager” in their title. Second, these docs get better with input from multiple smart people.

So if you feel qualified, think AWS should be spun out, and would like to improve the document, please fire away. The most obvious way would be with a pull request or new issue, but that requires a GitHub account, which probably has your name on it. If you don’t feel comfortable contributing in public to this project, you could make a burner GitHub account. Or if you really don’t want to, email me diffs or suggestions. But seriously, anyone who’s qualified to improve this should be able to wrangle a pull request or file an issue.

What’s needed?

Since only one human has read this so far, there are guaranteed to be infelicities and generally dumb shit. In particular, the FAQ is not nearly big enough; there need to be more really hard questions with really good answers.

Also, appendices are needed. The most important one would be “Appendix C”, mentioned in the draft but not yet drafted. It covers the projected financial effects of spinning out A-Cloud. I’d love it if someone financially savvy would wrap something around plausible numbers.

Other likely appendices would be a description of the A-Cloud financial transaction, and (especially) a go-to-market plan for the new corporation: How is this presented to the world in a way that makes sense and is compelling?

Document read?

Before these things end up in front of Andy Jassy, there are typically a few preparatory document reads where intermediate-level leaders and related parties get a chance to spend 20-30 minutes reading the document then chime in with comments.

Given enough requests, I’d be happy to organize such a thing and host it on (of course) Amazon Chime, which really isn’t bad. Say so if you think so.

AWS’s Share of Amazon’s Profit 14 Jun 2020, 12:00 pm

I’ve often heard it said that AWS is a major contributor to Amazon’s profitability. How major? Here’s a graph covering the last ten quarters, with a brief methodology discussion:

AWS Proportion of Amazon Profit, Quarterly since 2018 Q1

I think this is a useful number to observe as time goes by, and it would be good if we generally agree on how to compute it. So here’s how I did it.

Start at the Quarterly Results page at ir.amazon.com. It only has data back through 2018 which seems like enough history for this purpose. Let’s take a walk through, for example, the Q1 2020 numbers.

Please note: I have zero inside knowledge about the facts behind these numbers. Every word here is based on publicly available data.

It’s easy to identify the AWS number; search in the quarterly for “Segment Information” and there it is: Net sales $10,219M, Operating expenses $7,144M, Operating Income $3,075M. So if we want to know the proportion of Amazon profits that are due to AWS, that’d be the numerator.

So what’s the denominator? Since the AWS results exclude tax, so should the Amazon number. Search for “Consolidated Statements of Operations” and there are two candidates: “Operating Income” and “Income before income taxes”. The first has the same name as the AWS line, so it’s a plausible candidate. But it excludes net interest charges and $239M of unspecified “Other income”.

This is a little weird, since interest is a regular cost of doing business. It might be the case that the interest expense is mostly due to financing computers and data centers for AWS, or alternatively that it’s mostly financing new fulfillment centers and trucks and planes and bulk buys of toilet paper. And then there’s the “Other income”.

If we assume that the “Other income” and net interest gain/loss is distributed evenly across AWS and the rest of Amazon, then you should use the “Operating” line. I’m not 100% comfy with that assertion, so I made a graph with both lines, and I think it’s likely the best estimate of AWS’s proportion of Amazon’s income falls in the space between them.

So, if this graph is right, then over the last couple of years, somewhere between 50% and 80% of Amazon’s profit has been due to AWS. Glancing at the graph, you might think the trend is up, but that verges on extrapolation, I’d want to wait a couple more quarters before having an opinion.

Since I’m not a finance professional, I may have gone off the rails in some stupidly obvious way. So here’s the .xlsx I used to generate the graph. I used Google Sheets, so I hope it doesn’t break your Excel.

Please let me know if you find any problems.

Page processed in 0.332 seconds.

Powered by SimplePie 1.4-dev, Build 20140208172103. Run the SimplePie Compatibility Test. SimplePie is © 2004–2020, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.