<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://coderinserepeat.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://coderinserepeat.com/" rel="alternate" type="text/html" /><updated>2026-05-26T21:49:05-04:00</updated><id>https://coderinserepeat.com/feed.xml</id><title type="html">Alas, A Website</title><subtitle>My personal website, with blog posts, projects, recipes, etc.</subtitle><author><name>Bojan Rajkovic</name></author><entry><title type="html">Chat, Cowork, or Code?</title><link href="https://coderinserepeat.com/2026/04/02/chat-cowork-or-code.html" rel="alternate" type="text/html" title="Chat, Cowork, or Code?" /><published>2026-04-02T00:00:00-04:00</published><updated>2026-04-02T00:00:00-04:00</updated><id>https://coderinserepeat.com/2026/04/02/chat-cowork-or-code</id><content type="html" xml:base="https://coderinserepeat.com/2026/04/02/chat-cowork-or-code.html"><![CDATA[<p>I’ve been spending a lot of time with Claude across all three of its modes: Chat, Cowork, and Code. I also talk to a lot of coworkers about AI usage, and one of the main questions I get is: which one should I use, and when. My answer keeps being “it depends, do you have 15 minutes” which isn’t always helpful, so I’m writing it down properly.</p>

<p>The three tools share the same underlying models, and they can all access MCP servers and integrations. The differences aren’t about raw capability, but about three axes that matter more than most people realize: how much context the tool already has about your work, how much it can <em>do</em> versus how much it can <em>say</em>, and how durable that context is across sessions.</p>

<h2 id="context-depth-and-action-depth">Context depth and action depth</h2>

<p>I think about these tools on a gradient. Chat is a blank slate: every new chat starts from zero (aside from some “memory” that builds over conversations), and you supply all the context in the prompt or hope that your old thread was retained. If you’re asking “what can I substitute for buttermilk in this recipe?” or “is X a reasonable approach for Y?”, Chat is perfect. The question is self-contained, the answer is disposable, and you don’t need Claude to remember anything about you or your work. I use Chat for these kinds of questions a handful of times a day, and almost never go back to read the conversations afterward.</p>

<p>Cowork occupies the middle ground. It can see your project files, create artifacts, persist notes across sessions, and build up context about what you’re working on. The key differentiator isn’t memory — it’s <em>artifacts</em>: documents, research notes, structured outputs that live as files you own. You can back them up, edit them outside of Claude, reorganize them, and keep them forever. Chat’s memory is shallow and automatic; Cowork’s context is deep and deliberate.</p>

<p>Code goes further. It can read your filesystem, search your codebase, execute commands, write and edit files, and operate in agentic loops where it plans, acts, observes, and adjusts. It’s not just aware of your context; it can <em>change</em> your context. For engineering work, this is where I spend most of my time, but I want to be precise about why: not because Code is “better,” but because the tasks I bring to it benefit from the action depth. Writing a blog post? Cowork is fine. Refactoring a module across 40 files and running the test suite to verify? That’s Code.</p>

<h2 id="the-tool-outgrows-the-task-or-the-task-outgrows-the-tool">The tool outgrows the task (or the task outgrows the tool)</h2>

<p>I have a health research project that illustrates this nicely. It started as a series of one-off questions in Chat: medication interactions, what a lab result means, whether a side effect is expected. Pure Chat territory. Self-contained questions, disposable answers.</p>

<p>Over a few months, the project accumulated mass. I had six research documents cross-referencing each other. I needed context from previous conversations to ask useful follow-up questions. Eventually I was writing Python scripts to analyze CPAP data and building an HTML dashboard to visualize sleep patterns. The project now lives in Code with its own directory, its own CLAUDE.md context file, and a journal of observations over time.</p>

<p>I didn’t make a wrong tool choice at the start. The task’s weight shifted under me, and the tool needed to shift with it. The thing to watch for is the transition point: when you’re fighting the tool instead of using it.</p>

<h2 id="what-persists">What persists</h2>

<p>A third axis beyond context and action matters most for sustained work: what persists.</p>

<p>All three tools send your data to Anthropic’s servers for processing. There’s no difference in what Anthropic <em>sees</em>. The difference is in what happens afterward, and in what persists for <em>you</em>.</p>

<p>Chat conversations are retained for 30 days (or up to 5 years if you’ve opted into training data sharing<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>). The 30-day clock resets from your last message, so active threads stick around. But Claude’s <em>useful</em> context degrades long before that: the context window compresses and compacts older messages as the conversation grows, so the nuance you built up early quietly erodes even while you’re still talking. Once you stop, 30 days of inactivity and it’s gone entirely. You’re left with nothing to carry forward.</p>

<p>Cowork and Code change this equation, not by hiding data from Anthropic, but by keeping the useful work product on your machine. Cowork’s local project folders and Code’s CLAUDE.md files, auto-memory, and project context all persist regardless of what Anthropic retains or forgets. When their 30 days are up, their copy is gone, but your accumulated context remains, ready to load into the next session. Code goes furthest here: its memory system builds up structured project knowledge over time that survives across sessions indefinitely.</p>

<p>Simon Willison has <a href="https://simonwillison.net/">written extensively</a> about the importance of understanding where your data goes when you use LLM tools<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>. For my own usage, the takeaway is this: the real value of Cowork and Code isn’t data privacy. It’s context durability.</p>

<h2 id="the-instantiation-cost-counterweight">The instantiation cost counterweight</h2>

<p>The most powerful tool isn’t always the right one. There’s a real cost to reaching for Code when Chat would do.</p>

<p>Code sessions have startup overhead: loading project context, initializing the session, navigating permission prompts. Cowork has project setup cost. Chat has neither. For a question like “what’s a good substitute for fish sauce in this dressing?”, firing up Code would be like driving to the grocery store to buy a single lime. The overhead dwarfs the task.</p>

<p>All three tools share the same Pro subscription at $20/month, with Max tiers at $100 and $200 for heavier usage. The cost differential isn’t dollars per tool; it’s how each tool consumes your usage budget within the tier. Code sessions are token-heavy. They load context, they make tool calls, they run agentic loops. Chat messages are lightweight. If you’re on a Pro plan and you’re hitting rate limits, reaching for Chat instead of Code on simple questions is the first lever to pull before upgrading to Max.</p>

<p>This matters even more on enterprise plans, where teams often share a token budget allocated to their organization. When your team has a fixed pool of tokens per month and six engineers are drawing from it, how each person chooses their tool directly affects whether the team hits its ceiling by week three. Anthropic’s own documentation puts the <a href="https://code.claude.com/docs/en/costs">average Code user at ~$6/day</a> in token usage, with 90% of users staying under $12. Multiply that across a team, add in a few people using Code for questions that Chat would handle in 30 seconds, and you can see how the budget math gets unfriendly fast.</p>

<h2 id="the-framework">The framework</h2>

<p>I’ve landed on a mental model I don’t claim is universal, but it’s been useful enough to share:</p>

<p><strong>Chat</strong> is for questions that are self-contained and disposable. The answer doesn’t need to persist and Claude doesn’t need to know anything about me to give it. Ingredient substitutions, gut checks on an approach, quick syntax reminders, brainstorming names for things. If I won’t care about the answer tomorrow, it’s a Chat question.</p>

<p><strong>Cowork</strong> is for work that accumulates context over time. Research projects, document drafting, anything where the outputs need to be durable artifacts rather than conversation messages. The transition signal from Chat to Cowork is when I realize I need structured files I can organize, edit, and keep, not just a conversation thread with some memory attached. If the topic has enough mass that I’d want to back up the context, I’ve outgrown Chat.</p>

<p><strong>Code</strong> is for work that benefits from action, not just advice. If the task involves reading files, searching a codebase, running commands, or making changes across multiple files, Code’s agentic capabilities save more time than they cost in tokens. The transition signal from Cowork to Code is when I find myself copying Claude’s output and pasting it into files manually. My boss has a phrase I’ve stolen: “the robot should be the robot.” If I’m acting as the intermediary between Claude’s suggestions and my filesystem, I’ve taken on a job that Code does natively.</p>

<p>This is the first post in what I’m planning as a series on how I use AI tools. For now, if you’re a Claude user spending $20/month and wondering whether you’re getting your money’s worth: pay attention to which tool you’re reaching for and why. The model is the same across all three. The leverage is in the context.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Anthropic changed their consumer data policy in August 2025 to allow opt-in training data sharing with 5-year retention. The rollout was, charitably, not their finest moment. If you haven’t checked your settings at <a href="https://claude.ai/settings/data-privacy-controls">claude.ai/settings/data-privacy-controls</a>, it’s worth a visit. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p><a href="https://simonwillison.net/">Simon Willison’s blog</a> is probably the single best resource for thinking clearly about LLM tool usage, data privacy, and the practical realities of building on top of these models. He’s been writing about this longer and more carefully than almost anyone else. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Bojan Rajkovic</name></author><category term="Essays" /><category term="AI" /><category term="LLMs" /><category term="Claude" /><category term="Tooling" /><summary type="html"><![CDATA[Anthropic ships three ways to talk to Claude, and they're not interchangeable. Here's how I think about which one to reach for, and why the answer has less to do with capability than with context, retention, and your token budget.]]></summary></entry><entry><title type="html">Automate the Docker DNS Pain Away</title><link href="https://coderinserepeat.com/2023/07/31/automate-the-docker-dns-pain-away.html" rel="alternate" type="text/html" title="Automate the Docker DNS Pain Away" /><published>2023-07-31T00:00:00-04:00</published><updated>2023-07-31T00:00:00-04:00</updated><id>https://coderinserepeat.com/2023/07/31/automate-the-docker-dns-pain-away</id><content type="html" xml:base="https://coderinserepeat.com/2023/07/31/automate-the-docker-dns-pain-away.html"><![CDATA[<p>Like many other nerds, I have somewhat of a homelab at home. These days it’s not as complicated as it used to be, consisting largely of a big “NAS”<sup id="fnref:0"><a href="#fn:0" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, a Home Assistant box, and a couple other small things.</p>

<p>The NAS, being a beefy server machine, runs a bunch of Docker containers for various things — Octoprint, Docspell (which runs 2 of its own containers + Solr), etc.. It also runs NixOS<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>, and all of these containers are fronted by the <a href="https://nixos.wiki/wiki/ACME">Let’s Encrypt</a> and <a href="https://nixos.wiki/wiki/Nginx">Nginx</a> infrastructure that it provides. To avoid exposing ports on the NAS’s “host” network, I point Nginx virtual hosts directly at container IPs, like so:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">services</span><span class="p">.</span><span class="nx">nginx</span><span class="p">.</span><span class="nx">virtualHosts</span><span class="p">.</span><span class="dl">"</span><span class="s2">octoprint.coderinserepeat.com</span><span class="dl">"</span> <span class="o">=</span> <span class="p">{</span>
  <span class="nx">enableACME</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
  <span class="nx">acmeRoot</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
  <span class="nx">forceSSL</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
  <span class="nx">locations</span><span class="p">.</span><span class="dl">"</span><span class="s2">/</span><span class="dl">"</span> <span class="o">=</span> <span class="p">{</span>
	<span class="nx">proxyPass</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">http://10.0.1.1:80</span><span class="dl">"</span><span class="p">;</span>
	<span class="nx">proxyWebsockets</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
	<span class="nx">extraConfig</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">client_max_body_size 0;</span><span class="dl">"</span><span class="p">;</span>
  <span class="p">};</span>
<span class="p">};</span>
</code></pre></div></div>

<p>This generally works, except…one of the things that NixOS does as part of a <code class="language-plaintext highlighter-rouge">nixos-rebuild --switch</code> when using the declarative Nginx configuration is an Nginx configuration check. Normally, this is great: if I screw up the configuration somehow (e.g. injecting some bad configuration), it won’t take down Nginx. However, it has a big downside: if containers are restarted/container configuration changes, assigned IPs are not stable<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>, and Nginx configuration will fail to validate.</p>

<p>Previously, I’d tried a number of things that purported to provide a Docker &lt;-&gt; DNS translation, subscribing to Docker daemon events and running a DNS server that I could point other things at. In practice, this never worked quite right: despite telling <code class="language-plaintext highlighter-rouge">systemd-resolved</code> that the <code class="language-plaintext highlighter-rouge">dns-proxy-server</code> container should be used for DNS, rebuilds (and thus Nginx config checks) would frequently fail because the upstreams would fail to respond on the <code class="language-plaintext highlighter-rouge">proxyPass</code> ports.</p>

<p>I was about to embark on a “stupid scratch a homelab itch” project and write something that connects to Docker, listens for events, and updates Route53<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>, when <a href="https://hachyderm.io/@zrail">Pete Keen</a> suggested that I check out the <a href="https://github.com/nginx-proxy/docker-gen/"><code class="language-plaintext highlighter-rouge">docker-gen</code></a> project, and then pointed me at <a href="https://dnscontrol.org/"><code class="language-plaintext highlighter-rouge">dnscontrol</code></a> as well. Sensing an opportunity to hit a Pareto optimal<sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">5</a></sup>, I set about hacking up some <code class="language-plaintext highlighter-rouge">systemd</code> services and a <code class="language-plaintext highlighter-rouge">dnsconfig.js.tmpl</code> file, and an hour or so later, had something extremely feasible.</p>

<p>For the purposes of this writeup, I’m going to assume that you already have the <code class="language-plaintext highlighter-rouge">dnscontrol</code> and <code class="language-plaintext highlighter-rouge">docker-gen</code> binaries somewhere on your system. In my case, they’re in <code class="language-plaintext highlighter-rouge">/nas/homes/brajkovic/bin</code>. I also assume that you’re using Nix/NixOS, because I didn’t write the units manually, but hopefully these declarations for the <code class="language-plaintext highlighter-rouge">systemd</code> units are simple enough to manually write the full unit.</p>

<h2 id="the-systemd-units">The <code class="language-plaintext highlighter-rouge">systemd</code> units</h2>

<h3 id="docker-gen"><code class="language-plaintext highlighter-rouge">docker-gen</code></h3>

<p>First, the <code class="language-plaintext highlighter-rouge">docker-gen</code> unit — <code class="language-plaintext highlighter-rouge">docker-gen</code> knows how to run as a daemon and listen to events, so we can run it as a normal <code class="language-plaintext highlighter-rouge">systemd</code> service:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">systemd</span><span class="p">.</span><span class="nx">services</span><span class="p">.</span><span class="dl">"</span><span class="s2">docker-gen-dns</span><span class="dl">"</span> <span class="o">=</span> <span class="p">{</span>
  <span class="nx">path</span> <span class="o">=</span> <span class="p">[</span>
    <span class="dl">"</span><span class="s2">/nas/homes/brajkovic/bin</span><span class="dl">"</span>
  <span class="p">];</span>

  <span class="nx">script</span> <span class="o">=</span> <span class="dl">''</span>
    <span class="nx">docker</span><span class="o">-</span><span class="nx">gen</span> <span class="o">-</span><span class="nx">config</span> <span class="nx">docker</span><span class="o">-</span><span class="nx">gen</span><span class="p">.</span><span class="nx">cfg</span>
  <span class="dl">''</span><span class="p">;</span>

  <span class="nx">serviceConfig</span><span class="p">.</span><span class="nx">WorkingDirectory</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">/nas/homes/brajkovic/.config/dns</span><span class="dl">"</span><span class="p">;</span>
  <span class="nx">wantedBy</span> <span class="o">=</span> <span class="p">[</span> <span class="dl">"</span><span class="s2">multi-user.target</span><span class="dl">"</span> <span class="p">];</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The working directory is where the <code class="language-plaintext highlighter-rouge">docker-gen.cfg</code> file lives, it’ll be in the next section.</p>

<h3 id="dnscontrol"><code class="language-plaintext highlighter-rouge">dnscontrol</code></h3>

<p>Next, the <code class="language-plaintext highlighter-rouge">dnscontrol</code> unit — in this case, we register it as a <code class="language-plaintext highlighter-rouge">oneshot</code> unit, because <code class="language-plaintext highlighter-rouge">docker-gen</code> will run <code class="language-plaintext highlighter-rouge">systemd</code> to start it.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">systemd</span><span class="p">.</span><span class="nx">services</span><span class="p">.</span><span class="dl">"</span><span class="s2">dnscontrol-apply-docker.coderinserepeat.com</span><span class="dl">"</span> <span class="o">=</span> <span class="p">{</span>
  <span class="nx">path</span> <span class="o">=</span> <span class="p">[</span>
    <span class="dl">"</span><span class="s2">/nas/homes/brajkovic/bin</span><span class="dl">"</span>
  <span class="p">];</span>

  <span class="nx">script</span> <span class="o">=</span> <span class="dl">''</span>
    <span class="nx">dnscontrol</span> <span class="nx">version</span>
    <span class="nx">dnscontrol</span> <span class="nx">preview</span>
    <span class="nx">dnscontrol</span> <span class="nx">push</span>
  <span class="dl">''</span><span class="p">;</span>

  <span class="nx">serviceConfig</span><span class="p">.</span><span class="nx">Type</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">oneshot</span><span class="dl">"</span><span class="p">;</span>
  <span class="nx">serviceConfig</span><span class="p">.</span><span class="nx">WorkingDirectory</span> <span class="o">=</span> <span class="dl">"</span><span class="s2">/nas/homes/brajkovic/.config/dns</span><span class="dl">"</span><span class="p">;</span>
  <span class="nx">after</span> <span class="o">=</span> <span class="p">[</span> <span class="dl">"</span><span class="s2">multi-user.target</span><span class="dl">"</span> <span class="p">];</span>
<span class="p">};</span>
</code></pre></div></div>

<h2 id="the-config-files">The config files</h2>

<h3 id="docker-gencfg"><code class="language-plaintext highlighter-rouge">docker-gen.cfg</code></h3>

<p>This file configures <code class="language-plaintext highlighter-rouge">docker-gen</code>’s behavior, and is super simple:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">[[</span><span class="n">config</span><span class="k">]]</span>
<span class="n">dest</span> <span class="o">=</span><span class="w"> </span><span class="s">"dnsconfig.js"</span>
<span class="n">notifycmd</span> <span class="o">=</span><span class="w"> </span><span class="s">"systemctl start dnscontrol-apply-docker.coderinserepeat.com"</span>
<span class="n">template</span> <span class="o">=</span><span class="w"> </span><span class="s">"dnsconfig.js.tmpl"</span>
<span class="n">watch</span> <span class="o">=</span><span class="w"> </span><span class="kc">true</span>
<span class="n">wait</span> <span class="o">=</span><span class="w"> </span><span class="s">"500ms:2s"</span>
</code></pre></div></div>

<p>It tells <code class="language-plaintext highlighter-rouge">docker-gen</code> to source the template from <code class="language-plaintext highlighter-rouge">dnsconfig.js.tmpl</code>, write it to <code class="language-plaintext highlighter-rouge">dnsconfig.js</code>, and then run our <code class="language-plaintext highlighter-rouge">dnscontrol</code> unit as the “notify” command after it’s done updating the template. Setting <code class="language-plaintext highlighter-rouge">watch</code> to <code class="language-plaintext highlighter-rouge">true</code> puts <code class="language-plaintext highlighter-rouge">docker-gen</code> in daemon mode, and <code class="language-plaintext highlighter-rouge">wait</code> configures the hysteresis: it will wait at least 500ms, at most 2 seconds, to debounce changes.</p>

<h3 id="dnsconfigjstmpl"><code class="language-plaintext highlighter-rouge">dnsconfig.js.tmpl</code></h3>

<p>The <code class="language-plaintext highlighter-rouge">dnscontrol</code> template, also deceptively simple:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">REG_NONE</span> <span class="o">=</span> <span class="nc">NewRegistrar</span><span class="p">(</span><span class="dl">"</span><span class="s2">none</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">DSP_R53</span> <span class="o">=</span> <span class="nc">NewDnsProvider</span><span class="p">(</span><span class="dl">"</span><span class="s2">r53_main</span><span class="dl">"</span><span class="p">);</span>

<span class="nc">D</span><span class="p">(</span><span class="dl">"</span><span class="s2">docker.coderinserepeat.com</span><span class="dl">"</span><span class="p">,</span> <span class="nx">REG_NONE</span><span class="p">,</span> <span class="nc">DnsProvider</span><span class="p">(</span><span class="nx">DSP_R53</span><span class="p">),</span>
<span class="p">{{</span><span class="nx">range</span> <span class="nx">$key</span><span class="p">,</span> <span class="na">$value</span> <span class="p">:</span><span class="o">=</span> <span class="p">.}}</span>
    <span class="p">{{</span><span class="k">if</span> <span class="nx">$value</span><span class="p">.</span><span class="nx">IP</span><span class="p">}}</span>
    <span class="c1">// {{ $value.Name }} ({{$value.ID}} from {{$value.Image.Repository}})</span>
    <span class="nc">A</span><span class="p">(</span><span class="dl">"</span><span class="s2">{{ $value.Name }}</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">{{$value.IP}}</span><span class="dl">"</span><span class="p">),</span>
    <span class="p">{{</span><span class="nx">end</span><span class="p">}}</span>
<span class="p">{{</span><span class="nx">end</span><span class="p">}}</span>
    <span class="c1">// Allow letsencrypt to issue certificate for this domain</span>
    <span class="nc">CAA</span><span class="p">(</span><span class="dl">"</span><span class="s2">@</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">issue</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">letsencrypt.org</span><span class="dl">"</span><span class="p">),</span>
    <span class="c1">// Allow ACM to issue certificates for this domain</span>
    <span class="nc">CAA</span><span class="p">(</span><span class="dl">"</span><span class="s2">@</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">issue</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">amazon.com</span><span class="dl">"</span><span class="p">),</span>
    <span class="c1">// Allow no CA to issue wildcard certificate for this domain</span>
    <span class="nc">CAA</span><span class="p">(</span><span class="dl">"</span><span class="s2">@</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">issuewild</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">;</span><span class="dl">"</span><span class="p">),</span>
    <span class="c1">// Report all violation to test@example.com. If CA does not support</span>
    <span class="c1">// this record then refuse to issue any certificate</span>
    <span class="nc">CAA</span><span class="p">(</span><span class="dl">"</span><span class="s2">@</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">iodef</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">mailto:caa@coderinserepeat.com</span><span class="dl">"</span><span class="p">,</span> <span class="nx">CAA_CRITICAL</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div></div>

<p>This is mostly basic JavaScript, plus some Go template language — we emit all the <code class="language-plaintext highlighter-rouge">A</code> records for the Docker images, and some really basic <code class="language-plaintext highlighter-rouge">CAA</code> records so that we can issue certs if we need to for those DNS names<sup id="fnref:5"><a href="#fn:5" class="footnote" rel="footnote" role="doc-noteref">6</a></sup>.</p>

<h3 id="credsjson"><code class="language-plaintext highlighter-rouge">creds.json</code></h3>

<p>The basic “credentials” file for <code class="language-plaintext highlighter-rouge">dnscontrol</code>:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"r53_main"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"TYPE"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ROUTE53"</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>This doesn’t actually have any credentials, because those are provided by the standard AWS SDK credentials mechanism — I should probably do something better with those secrets, but if you can either log into or physically steal my NAS, you’ve earned my AWS creds.</p>

<h2 id="wrapping-it-all-up">Wrapping It All Up</h2>

<p>Putting all that together, we’re <em>done</em>. The <code class="language-plaintext highlighter-rouge">docker-gen</code> daemon runs, supervised by its <code class="language-plaintext highlighter-rouge">systemd</code> unit. When it needs to, it spawns <code class="language-plaintext highlighter-rouge">dnscontrol</code>, but it mostly just sits there idly — I had to restart it to get any recent output, and it said:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Jul 31 22:32:58 hagal docker-gen-dns-start[3691698]: 2023/07/31 22:32:58 Watching docker events
Jul 31 22:32:58 hagal docker-gen-dns-start[3691698]: 2023/07/31 22:32:58 Contents of dnsconfig.js did not change. Skipping notification 'systemctl start dnscontrol-apply-docker.coderinsepeat.com'
</code></pre></div></div>

<p>When I manually killed a container<sup id="fnref:6"><a href="#fn:6" class="footnote" rel="footnote" role="doc-noteref">7</a></sup>, you can see the expected output when things do happen:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Jul 31 22:35:09 hagal docker-gen-dns-start[3691698]: 2023/07/31 22:35:09 Received event die for container 236c84adea38
Jul 31 22:35:10 hagal docker-gen-dns-start[3691698]: 2023/07/31 22:35:10 Debounce minTimer fired
Jul 31 22:35:10 hagal docker-gen-dns-start[3691698]: 2023/07/31 22:35:10 Received event stop for container 236c84adea38
Jul 31 22:35:10 hagal docker-gen-dns-start[3691698]: 2023/07/31 22:35:10 Generated 'dnsconfig.js' from 7 containers
Jul 31 22:35:10 hagal docker-gen-dns-start[3691698]: 2023/07/31 22:35:10 Running 'systemctl start dnscontrol-apply-docker.coderinserepeat.com'
</code></pre></div></div>

<p>The unit was indeed started, and you can see <code class="language-plaintext highlighter-rouge">dnscontrol</code> applies the changes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Jul 31 22:35:11 hagal dnscontrol-apply-docker.coderinserepeat.com-start[3712349]: [INFO: Diff2 algorithm in use. Welcome to the future!]
Jul 31 22:35:11 hagal dnscontrol-apply-docker.coderinserepeat.com-start[3712349]: ******************** Domain: docker.coderinserepeat.com
Jul 31 22:35:12 hagal dnscontrol-apply-docker.coderinserepeat.com-start[3712349]: 1 correction (r53_main)
Jul 31 22:35:12 hagal dnscontrol-apply-docker.coderinserepeat.com-start[3712349]: #1: - DELETE dns-proxy-server.docker.coderinserepeat.com A 10.0.0.4 ttl=300
Jul 31 22:35:12 hagal dnscontrol-apply-docker.coderinserepeat.com-start[3712349]: SUCCESS!
Jul 31 22:35:12 hagal dnscontrol-apply-docker.coderinserepeat.com-start[3712349]: Done. 1 corrections.
</code></pre></div></div>

<p>Overall, really simple, and like I said, hits a strong Pareto optimal: an all-in-one solution would be cool, but bodging together some existing tools and a few <code class="language-plaintext highlighter-rouge">systemd</code> services provided satisfying short-term relief.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:0">
      <p>A Supermicro 6028U-TR4T+, with dual Xeon E5-2650L v3 processors, 128 GB of RAM, and all the disks I could cram in. I expect it to last me…a damn long time. <a href="#fnref:0" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1">
      <p>Which was in some ways a mistake, and in some ways really speeds things along. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>Manually managing IPAM for Docker containers is not my idea of a good time: see the aforementioned mild regret of using NixOS — I don’t actually want to spend that much sysadmin time. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>Where my domain is hosted, but likely it would have ended up supporting pluggable providers, because I can’t build anything without overbuilding it. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4">
      <p>80% of the desired outcome, 20% of the work. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5">
      <p>Not that we need to — the day-to-day records that I use live on the root domain. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6">
      <p>Just for fun, but I do need to cut this container out of the configuration for good. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Bojan Rajkovic</name></author><category term="Infrastructure" /><category term="docker" /><category term="dns" /><summary type="html"><![CDATA[Making progress from an infuriating, half-broken Docker DNS setup, to one that works reliably with some existing tools, and only mild cursing.]]></summary></entry><entry><title type="html">Knowledge management in the modern era</title><link href="https://coderinserepeat.com/2021/03/20/personal-knowledge-management-in-202x.html" rel="alternate" type="text/html" title="Knowledge management in the modern era" /><published>2021-03-20T00:00:00-04:00</published><updated>2021-03-20T00:00:00-04:00</updated><id>https://coderinserepeat.com/2021/03/20/personal-knowledge-management-in-202x</id><content type="html" xml:base="https://coderinserepeat.com/2021/03/20/personal-knowledge-management-in-202x.html"><![CDATA[<p>A few things have happened recently that have gotten me thinking about what the broader scope of knowledge management
(going beyond blogs and personal wikis, casting a wider net into all of the data generated by a person’s life) looks
like in the 2020s:</p>

<ol>
  <li><a href="https://twitter.com/joelmartinez/status/1359629846373048320">This thread</a> with my friend Joel, in which I riff on what I think a modern version of Vannevar Bush’s Memex
would look like, and how it compares to some existing tools out there (Microsoft’s ToDo/OneNote suite + the general
internet). I was originally introduced to the Memex by <a href="https://www.antipope.org/charlie/">Charlie Stross’s</a> Laundry Files series, and was
instantly fascinated by its presentation as an electromechanical contrivance that when imbued with sufficient magic
granted the user’s “data” access/perusal speeds almost equivalent to “main memory.” Since then, Vannevar Bush’s “As
We May Think,” which introduced the Memex (as well as several other interesting ideas about how we capture and access
our life’s data), has been an influence on both my thinking and on this writing.</li>
  <li>I think a lot about knowledge management at work. It’s not a matter of secrecy that the bank I work at is currently
in the midst of a <abbr title="Centralized Online Real-time Exchange">CORE</abbr> transformation<sup id="fnref:0"><a href="#fn:0" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> project, nor is it a matter of secrecy that it constitutes about 80% of my
work product these days. In particular, a project like this requires, as a matter of course, an enormous amount of
documentation to be produced, and knowledge to be transferred: knowledge about the workings of the existing system,
knowledge about how the new system is configured, and documentation about how the business processes translate from
one system to the other. All of this has to be produced, organized, made accessible, and made searchable. More
importantly, though, it needs to be <strong>consistent</strong> (all in one place) and <strong>versioned</strong>. The ability to visualize
business logic change is paramount when dealing with such fundamental systems; every change at the core banking
system level has 2nd, 3rd, and maybe even 4th order ripples. This is especially true when business process revision
takes place at the same time as development work.</li>
  <li>I scanned another batch of paper documents. These currently end up in a folder with hundreds of other scans, only
loosely organized via filenames and some directory structure. The structure is mostly imposed by important events
that generate significant paperwork that needs organization: real estate transactions (2 house purchases, 1 house
sale, 1 refinance), each year’s tax season, etc. The filenames alone are helpful, but my ability to look up a
document is still nowhere near matching my ability to recall metadata about the document. Even when I find a scan
that I’m looking for, I’m still missing the full context at my fingertips: I can’t find related documents easily, I
can’t find related email easily, etc. Email is particularly important, especially with “paperwork generator” type
events—much of the context behind a document still lives in that format: “Why did we have the lawyers draft
this agreement to time out after 90 days versus 60 days?”, for example.</li>
</ol>

<p>The common thread between all of these is actually fundamentally simple: in each occurrence, in each data retrieval
context, the ability of my brain (and other brains…probably) to categorize (“I want all vet bills from 2019,” “I want
all checks paid to my wife from June 2016 through July 2018”) and to conjure up the “metadata” that I want to index on
far outstrips the ability of software systems (and physical systems—it’d be nearly impossible to do
cross-reference to the level I desire with physical documents). Bush’s Memex, and its influence on early hypertext
systems (Engelbart’s <abbr title="Mother Of All Demos">MOAD</abbr>, for example), seemed to predict a future where our memory, with its prodigious capability to
categorize and cross-reference, would be supplemented with computer systems that were organized the same way. Yet here
we are in 2021, and all the systems I’ve tried recently still focus on the sterile familiarity of a filesystem-like
layout, mired in the land of directed acyclic graphs. Why?</p>

<h1 id="was-this-really-predicted-in-the-1940s">Was this really predicted in the 1940s</h1>

<p>Yes. No, seriously. Go read <a href="http://mnielsen.github.io/notes/kay/assets/bush_1945.pdf">“As We May Think,”</a><sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>, and notice that in 1945, Bush says this, at the beginning of the
<a href="https://www.ps.uni-saarland.de/~duchier/pub/vbush/vbush6.shtml">6th major section</a><sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>:</p>

<blockquote>
  <p>THE HUMAN BRAIN FILES BY ASSOCIATION-THE MEMEX COULD DO THIS MECHANICALLY</p>
</blockquote>

<p>Notice then that he has condensed my common thread from before into a single, pithy sentence: “The human brain files by
association.” For the rest of the 6th section of “As We May Think,” Bush takes the opportunity to point out that we’ve
created artificial sorting and hierarchy to place data in storage, when in reality, our mind does not work that
way—our ability to binary search sorted, hierarchical organizations of data to find a single document pales in
comparison to the speed at which a person is able to free-associate metadata<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>:</p>

<blockquote>
  <p>Yet the speed of action, the intricacy of trails, the detail of mental pictures, is awe-inspiring beyond all else in
nature.</p>
</blockquote>

<p>It is in this section that Bush lays out the “memex”—predating common use of the word “meme” by 30 years, he
chooses this as the name of his memory-enhancing devices. The memex sounds an awful lot like a “battlestation” setup of
the 202Xs: multiple screens, multiple control planes for various tasks, etc. However, it’s biggest strength? In addition
to allowing its user to enter an endless stream of un-or-lightly organized data, it offers the ability to
cross-reference, to page and skim through data at amazing rates w/ mechanical assistance (and probably some mental
training, in skim-reading and categorization) and most of all, it supports the sort of serendipitous interaction that is
a key part of searching a pile of documents, putting together threads as you go.</p>

<h1 id="how-do-wikis-fit-into-this">How do wikis fit into this?</h1>

<p>It didn’t take long after Ward Cunninghan’s <a href="http://wiki.c2.com">1995 WikiWikiWeb</a> for wikis to emerge as the leading approach to
knowledge management—early third party implementations of Cunningham’s wiki concept emerged in the few years
following (TWiki in 1998, MoinMoin in 2000), and the concept gained ground quickly<sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">5</a></sup>. Mediawiki’s initial release in
2002 and the success of the Wikipedia project, and the release of Confluence in 2004 into the enterprise space (as well
as its subsequent success), really seemed to solidify the idea of the wiki as something that was here to stay.</p>

<p>Over time, wikis have built up a rich set of features that looks like it corresponds quite neatly with our stated
desires: good abilities to categorize and tag content, a rich markup for capturing notes and snippets of text, an
ability to contain more-or-less any kind of content (with limited degrees of usefulness), and a flexible storage
approach that isn’t explicitly tied up in outdated filesystem notions. However, despite a seemingly solid foundation,
even the latest installments of wikis fall short when it comes to being an augmentation of the human intellect.</p>

<h2 id="failing-1-poor-support-for-non-plaintext-knowledge">Failing #1: Poor support for non-plaintext knowledge</h2>

<p>Wikis are, by nature, built for plaintext knowledge—they’re intended to present textual documents, and serve as a
knowledge repository for data that could be principally described as documentation. Their evolution has led to primary
use as encyclopedia-alikes, whether that’s the public Wikipedia, enterprise/corporate wikis (I’ve seen Mediawiki,
Confluence, home-grown wikis, even GitHub’s Gollum wiki in this space), or personal wikis (most frequently used for
stuff like “The plumber is John Doe, his number is 555-123-4567,” which I like to refer to as “operational knowledge”
for one’s life).</p>

<p>However, by focusing on text, wikis have more-or-less no document pipeline to speak of. Attachments are possible, but
there’s no processing or indexing pipeline&amp;dash;if you want to reference the content of, say, a bill PDF, you need to:</p>

<ol>
  <li>Extract the content into a page</li>
  <li>Design a format for it (bills are generally easy and tabular, but not everything will be)</li>
  <li>Do the transcription</li>
</ol>

<p>This is pretty time consuming, and the UX still isn’t great in cases where the content of the attachment is not
trivially transcribable to plaintext. This falls down just as hard when the data is textual, but not plaintext—for
example, if you want to store and index an email thread (see the first section of this essay, where I talk about
“paperwork-generating events”), you have to at a minimum copy all the contents out. At a maximum, you must separate the
messages, build a hierarchy for the thread, come up with the appropriate categorization, etc. The depth of the process,
however, is directly proportional to its final utility: a single page w/ a copy of all the contents is less broadly
useful than each email as a discrete, indexable, and referencable document.</p>

<p>A system more attuned to the needs of different document forms would recognize an email thread and categorize
appropriately, recognize a PDF and apply OCR/text extraction as needed, perhaps even recognize images and attempt to
make guesses at their contents, and use those to enrich the search index.</p>

<h2 id="failing-2-poor-taxonomyorganizational-ux">Failing #2: Poor taxonomy/organizational UX</h2>

<p>Another place where wikis fall down is in their taxonomy &amp; organization UX. While most of them are able to be expressive
in their taxonomy, the actual presentation of the taxonomy always seems cast by the wayside. Rarely is the presentation
up front—generally speaking, the wiki expects you to enter via a landing page, which itself links to other pages.
Rarely is the organization of pages up-front—you’re expected to maintain category pages, rather than being able to
use the category/tag itself as an entry point.</p>

<p>Consider the following case: I know I have a note/document somewhere that has information on the plumber that I use, and
I want to look up the plumber, and also see how much I’ve paid for various services via invoices. In most Wikis, I have
to either look up the “Professional Services” page, which then has the list of documents, find the Plumber, and look up
the bills from there. But the professional services page has to be linked from the landing, and maybe has to be
maintained manually. A system with a more up-front focus on taxonomy would give me an entry point to explore
categorization, or even give me a direct input into filtering the knowledge set using the predeclared taxonomy.</p>

<p>Organization-wise, wikis also tend to fall down because of their semi-global namespacing—Confluence in particular
has this problem, where if you want to express the same structure multiple times within a single “space,” you are
completely barred from doing so! You have to either do contortions like <code class="language-plaintext highlighter-rouge">[Repeated Name]: Foo</code> and <code class="language-plaintext highlighter-rouge">[Repeated Name]:
Bar</code>, or create multiple spaces. Mediawiki has more or less the same issue—namespacing is possible, but ugly
(<code class="language-plaintext highlighter-rouge">Foo:Bar</code>), and still doesn’t solve the repeating structure issue in a meaningful way.</p>

<h1 id="how-do-document-managemententerprise-information-systems-fit-in">How do document management/enterprise information systems fit in?</h1>

<p>As part of exploring the space, I’ve also looked at more pure document management/enterprise information systems. I’m
passingly familiar with Perkeep (f/k/a Camlistore), which I ran briefly at home, and Hyland’s OnBase product, which my
employer uses. I’ve also looked at others, like DocumentCloud, PaperSave, etc. These systems have an area of strength,
and that’s focusing on archival and document lineage—you get a data lineage, some classification, integration with
business workflow software in the “enterprise” cases. These features have value as an overall component of knowledge
management—nobody actually enjoys losing data, and knowing the history of where things came from is useful.</p>

<p>However, their utility as overall knowledge management systems is lacking. They tend to focus on hierarchical
organization, and have limited cross-referencing/tagging/categorization capabilities. As well, they tend to be focused
on discrete pieces of data—scanned documents, images, etc. While this is useful for probably 80% of what knowledge
management should be (especially in enterprise settings, where executed contracts, etc. are a major generator of
paperwork and need to be available for rapid lookup), it still doesn’t help capture “flotsam” that is generated by
day-to-day activity. Even more usefully, they don’t capture the dynamic, living notation that a wiki does, and are
minimally helpful in cross-referencing the two. Sharepoint is really a shining illustration of this—the document
upload functionality, the associated versioning, and search within those documents are all pretty good. However, it
really falls down because it’s pure hierarchy, and its support for “wikis” is one of the most mediocre pieces of
software I’ve ever seen.</p>

<p>One place enterprise-targeted systems also deliver when it comes to search. SharePoint excels in this space (I’ve seen
it extract useful, searchable information from Office format files, PDFs, etc.). A piece of software in this category I
took a test drive of as part of researching what I wanted in this space was Kofax’s <a href="https://www.kofax.com/Products/paperport">PaperPort</a>. PaperPort was
appealing to me because of the promise of scanning to PDF, capturing on-the-go, and “transforming paper documents into
actionable digital information.” It turns out, it is actually quite good at some of these—its scanning integration
is excellent, and it does a good job at OCRing and searching scanned documents (or extracting text from pre-OCRed
documents). I loaded a few documents from Tax Season a few years ago, and was able to quickly find them among the
preloaded demo library by searching for keywords that I knew would be tax specific. However, PaperPort fails in the same
way that most document management systems do: its categorization and organization approach is catastrophic. It remains
mired in hierarchical patterns—this is probably OK for enterprises, but doesn’t work with my approach of
multi-categorization.</p>

<h1 id="what-do-i-actually-want-in-a-knowledge-management-system">What do I actually want in a knowledge management system?</h1>

<p>Naturally, as part of writing this, I’ve thought a lot about what I actually need in a knowledge management system. Much
of this, I think, will be applicable to anyone looking to organize their knowledge. Some of it is probably idiosyncratic
to me, or at least not broadly applicable to everyone needing knowledge management. Even those places, I think, offer
readers a chance to maybe rethink the way they store, think about, and search their extended memory. Of course, I’m also
coming at this from a “maker”’s perspective—I build software for a living, so naturally, I’ve spent a fair amount
of time also thinking about how I would build this.</p>

<h2 id="storage-is-obviously-critical">Storage is obviously critical</h2>

<p>I mentioned above that many document management systems have a principal focus on archival-quality storage—Perkeep
explicitly states in its mission that “Your data should be alive in 80 years, especially if you are.” I don’t know about
80 years, but I definitely don’t like losing things. The good news is, I think storage is largely a solved problem.
There’s probably room to argue on this, and it’s going to be dependent in some part on <strong>where</strong> your data is held.
People running their own storage infrastructure (NAS, backups, etc.) probably will disagree a little bit that reliable
storage is a solved problem.</p>

<p>While I have network-attached storage at home, and make extensive use of it, I would probably choose to build this
system on the public cloud. “Blob” stores are all capable of doing this job quite well, have reasonable costs, and
provide value in the form of some of their default features on top of just storage. For example, most of them have a
low-complexity versioning concept that would be sufficient to satisfy my desire to version knowledge where it makes
sense. AWS S3, Azure Blob Storage, GCP Cloud Storage are all satisfactory. Backblaze’s B2 might be a dark horse
candidate, due to its cost and Backblaze’s relative pedigree.</p>

<p>I’d probably use S3 anyway and eat the extra cost—I don’t have enough data to store for the cost to
make a difference, and I’d probably build the rest of the system on AWS as well. If I were incentivized to make this
distributable, or meaningfully multi-user, I’d either use the filesystem (with some tricks to reduce disk use and
present data closer to the system’s “mental” model<sup id="fnref:5"><a href="#fn:5" class="footnote" rel="footnote" role="doc-noteref">6</a></sup>), or automate separate cloud enclaves for each user of the
system. Some sort of password-derived KMS keywrapping, or something. Honestly, I haven’t spent a lot of time needing to
build systems w/ user-generated content that absolutely cannot be commingled, so I’d ask people smarter than me. I’m
willing to hand-wave away some things—at the end of the day, I’m more focused on the taxonomic &amp; UX portions than
on storage.</p>

<h2 id="support-for-all-types-of-media-is-another-must">Support for all types of media is another must</h2>

<p>The system has to support all kinds of knowledge, and support them <strong>naturally</strong>. Free text (basically, wiki pages or
other scratch notes), PDFs, emails, etc. should all garner the same level of text-search support. I should never be in
the position where I’m having to build my own canonical representation of some piece of data, when I already have it on
hand! The email thread case from above I think is especially illustrative, as it’s an area where existing system
probably fall down quite hard (except for maybe eDiscovery). I would make an exception for images, where I would expect
(and prefer!) to enter descriptive text around the subject of the image—as far as AI/ML systems have come in
identifying the contents of an image, they don’t capture any of the surrounding context.</p>

<p>I’m not sure about how to best deal with audio/video data in this context, to the point where I’m almost tempted to
leave it out entirely, but that feels wrong. I think a first cut has to approach it in the same way we approach
pictures: free tagging and free “description” entry, with the description entry being indexed for search in the same way
all other text is. A possibility is to transcribe the audio track where available, but I’m not sure about the
computational price of that.</p>

<p>Versioning some of this data might be a challenge as well, and I think for the sake of reasonable constraints, I might
choose to <strong>only</strong> version user-generated, wiki-style content. Presenting “Track Changes” style version management of
Word/Excel/etc. documents might also be a possibility, if that data can be easily extracted, but since those are
self-contained within the document, I don’t think they need storage assistance. In general, I find that Track Changes on
anything other than Word-style docs hard to understand—versioned spreadsheets should probably be database tables,
or at least Wiki pages where a diff is a little easier to render and understand.</p>

<p>I think there’s really not much of a verdict to give here—all data is beautiful and deserving of our
attention. As to tools, I’d probably reach for Apache Tika for text extraction from various formats, and maybe ffmpeg or
similar tools for basic metadata extraction from audio/video. I’m not aware of any computationally inexpensive
approaches to audio transcription to complement human-entered description for audio/video tracks, so that would be a
good research area.</p>

<h2 id="organization-and-search">Organization and search</h2>

<p>Obviously, this system needs a really good approach to taxonomy, a robust metadata capability, and a search capability
that combines taxonomic searches, content searches, and metadata searches. I think the system needs to distinguish
between a few different types of searchable data:</p>

<ol>
  <li>Well-structured metadata, intrinsic to the piece of knowledge. Things like when was it entered into the system, when
was it created, its original file name (distinct from the “name” within the system!), etc. Depending on the input,
this intrinsic metadata extraction might bleed into what would otherwise be extrinsic. For example, email-type input
would probably generate some taxonomic data automatically.</li>
  <li>Structured, but extrinsic data. This is really what I would call “user-entered” metadata. This includes things
like “when was this piece of knowledge created,”<sup id="fnref:6"><a href="#fn:6" class="footnote" rel="footnote" role="doc-noteref">7</a></sup> the taxonomic classification, user-entered related document
linkage, etc.</li>
  <li>Unstructured data, both intrinsic and extrinsic. This is the actual contents where they can be extracted (text, OCRed
text, transcribed audio, etc.), or the human-entered descriptions where they can’t (photos, videos, audio).</li>
</ol>

<p>The taxonomic data itself has additional constraints. I know I’ve largely railed against hierarchies in prior sections,
but I think there is a utility to them at one or two levels of depth, as a complement to other taxonomic approaches. I’m
still firm on my belief that pure hierarchical organization does not work. Two things I would want to see in terms of
taxonomic approaches:</p>

<ol>
  <li>At least one level of high/top-level kindedness. These should be free-form, though. A naive approach would leverage
data “kind” (images, documents, etc.) at the top level, but I think that again, flexible is better here. Especially
in a business setting (and even in a personal one), you likely want top-level categories for things like check
images, invoices, etc. so that you can separate quickly on broad-but-not-overbroad strokes. Nothing stops you from
having multiple breadths of category, either—an intrinsic kindedness category based on the data kind, and an
extrinsic category assigned by the user.</li>
  <li>A free-tagging second layer of taxonomy, that allows for some natural hierarchicalization. I’m not yet sure whether
this should require all layers to be represented, or simply infer them. The way I’m thinking about this is for,
say, checks, you want to represent the <code class="language-plaintext highlighter-rouge">from</code> and the <code class="language-plaintext highlighter-rouge">to</code> as separate tags, so that you can express searches like
“All checks from Pablo to me,” or “All checks from me to my general contractor.” The layers question boils down to
whether <code class="language-plaintext highlighter-rouge">foo:bar</code> implies <code class="language-plaintext highlighter-rouge">foo</code> or not—my lean is that yes, it does.</li>
</ol>

<p>Technologically, I’d take a “progressive enhancement” approach to storage. The intrinsic, well-structured
metadata I think could go into a relational database, along with notions of ownership, access control lists, etc.
Structured extrinsic metadata goes here as well, I think—most searches over it will be “whole word”-type searches,
that don’t need the sort of “nearest neighbor” search approaches that unstructured data needs. For the unstructured
data, I think for small data volumes, something like PostgreSQL’s GIN indexes + full-text search will be sufficient. At
larger data volumes, I’d probably reach for tools like Solr, Elastic, or custom work on top of Lucene. At sufficiently
large volumes, I’d look for an exit via Google acquisition, so they can apply their search work.</p>

<h2 id="presentation--ux">Presentation &amp; UX</h2>

<p>Presentation and UX of the system are going to be just as important as the storage and taxonomic layers. There’s a
number of problems with existing systems, but the two key ones for me are the process of loading data, and the “views”
into the data. This will probably be the longest section of “what I want to see,” because it’s by necessity the most
complex—without the benefit of a user experience, the “academic” storage and taxonomy qualities of the underlying
system are largely moot.</p>

<h3 id="loading-data-into-the-system">Loading data into the system</h3>

<p>Probably the most important part of the day to day interaction, actually bringing data into the system is an area that
has to be more fluid than it is today. My biggest issue with the systems I’ve tried so far, is that I have to enter each
piece of data individually, and no system I’ve used has ever shown me <strong>what</strong> I’ve just entered. This means that
anytime you need to batch-load data into the system, you have to inspect each piece by hand, upload it, enter the
metadata, and then go back to the next item. This batch upload with previewing is a key UX point, because it makes the
ingestion of data fast and natural.</p>

<p>Otherwise, there’s a few key entry points for new data:</p>

<ol>
  <li>Drag-and-drop/file select + free entry of text from a web UI. This is pretty standard, and probably the primary
mechanism for a lot of entry. Certainly “paperless home/office” scans, DSLR photos, bills, etc. would probably come
in that way.</li>
  <li>Wiki style, free-text entry. Good for notes, research work, documentation, etc. Pretty foundational for both personal
and business use—personal wikis are a great way to store tradespeople’s information, work histories (“plumber
fixed blah on blah, refer to invoice blah”, etc.). Wiki pages need at least basic cross-linking functionality (to
each other, and to other data), but otherwise I think should probably be fairly vanilla Markdown (or other structured
text) documents.</li>
  <li>Mobile apps are a definite nice-to-have. Since all of the “built in” frontend (file upload &amp; wiki editing) should be
API driven, building in mobile support should be possible. The biggest benefit of this would be “share sheet”
integration in the mobile OSes.</li>
  <li>A neat trick for ingesting email would be to support forwarding to an assigned email address. Most, if not all, of
the popular SaaS products for sending email also support this use case for receiving it. Ingesting mail this way
preserves as much of the original metadata as is possible, and is convenient to boot.</li>
</ol>

<h3 id="viewing-the-loaded-data">Viewing the loaded data</h3>

<p>Because most systems are tied into their existing notions of hierarchies, they get this wrong when their first view is
purely hierarchical. In reality, there needs to be a number of different entry points, that represent different elements
of the taxonomy.</p>

<p>Going back to the description I gave above, I suggested two layers of taxonomy: categories, and tags.
In this system, each of these taxonomic layers has its own entry point—tags and categories, as well as their
associated hierarchies, each have their entry point. The knowledge/data-type hierarchy also has a distinct entry
point—even though it’s not a principal point of organization for most of our purposes, it can be a useful entry
point, especially combined with the extrinsic taxonomic layers. This means that a “home” view needs to present all of
these different approaches to diving in, as well as a “search” entry point.</p>

<p>The search itself should aim to be natural, combining metadata searches with full text searches. The search language
should, whenever possible, try to make use of natural thinking/speaking patterns to formulate searches. An interesting
idea I had in this space is to utilize the natural spoken differences between <code class="language-plaintext highlighter-rouge">,</code> and <code class="language-plaintext highlighter-rouge">;</code> to determine disjunction vs.
conjunction. Owing to the shorter pause afforded by <code class="language-plaintext highlighter-rouge">,</code>, it becomes conjunctive: “Invoices, 2018, tradespeople” gives
you all invoices, paid in 2018, to tradespeople. Meanwhile, <code class="language-plaintext highlighter-rouge">;</code> becomes a disjunction: “Invoices; Checks”
gives you all documents matching (invoice OR check). There still remains ambiguity on whether <code class="language-plaintext highlighter-rouge">;</code> is inclusive or
exclusive—while a fascinating topic, it would be a major departure from this post, and best saved for other writing.</p>

<p>Another potentially useful view on the loaded data would be a “sync” view to the local system, especially when paired
with primary storage in the cloud. I mentioned this a while ago, and gave it a brief footnote, but I think it deserves a
little bit of extra coverage. The idea is to synchronize to a filesystem, and give each element of the taxonomic
hierarchy its own folder, and use links to reflect the fact that a document lives in multiple places, without
duplicating storage use. This type of synchronization would be especially useful with “media”-type documents, “paperwork
generating events,” or things like pay stubs, where there are times that you want to just get a copy of everything
associated with a particular taxonomic element.</p>

<h3 id="statistics-and-analysis">Statistics and analysis</h3>

<p>I’m actually not sure it makes much sense to do any sort of statistics or analysis on the stored data. None of the
obvious candidates are particularly interesting—who cares about file size, word count, etc.? Because our data is
massively heterogeneous, it’s hard to do much primary key analysis/subject analysis in coherent ways. Maybe with a
consistent format for certain bills, you could do some analysis, but I don’t belive it’s worth it to do so in this
medium. Another concept I’m going to introduce, “workspaces,” I think would provide a foundation for collecting the data
in order to analyze it in a more suitable suite of programs.</p>

<h3 id="workspaces">“Workspaces”</h3>

<p>The concept of a workspace here is one that’s really useful for research-type activities. Workspaces are intended to
facilitate research threads, like someone might use for research/planning on a novel, or for geneological research. They
can be populated manually, via search, or via a search, like “smart search” folders in many email clients. The goal is
to tie together cross-referencing ability, display, and note-taking in a way that existing tools like OneNote and
Scrivener don’t.</p>

<p>They’re functionally not that different from a single taxonomic point view, but I see them as a potential extension on
top of that view that provides some ability to multi-view, take notes and have them be automatically stored w/ relevant
“related document” links, etc. A workspace is what I would have used while researching this post (for example, I re-read
“As We May Think” and read most of Engelbart’s “Augmenting Human Intellect”), and I would have taken notes on each of
the papers, maybe even kept the blog post draft in the same workspace (remember, “wiki” style pages are just Markdown!).</p>

<h1 id="mental-models-and-techniques-for-this-system">Mental models and techniques for this system</h1>

<p>This is an area that I struggled to write, because there is an element of idiosyncrasy to this. Part of what makes this
system suited to me is that it is inherently compatible with my mental models and approaches to memory. The biggest
thing is that none of this is designed to require <strong>memorization</strong>. Quite to the contrary, it is designed to aid you in
not having to memorize prodigious amounts of information—you shouldn’t feel the need to resort to any memorization
techniques or practices in order to use this system. What you will find helpful is the ability to free-associate, and to
explore drawing “conclusions” quickly. In general, my brain seems to be tuned for rapidly making connections between
various pieces of information, and that would be principally helpful in composing searches in this system, as well as
exploring related documents.</p>

<p>However, even that is probably limited in its utility, as the system is somewhat designed to help you build up that
ability to rapidly associate between information that you already know. After all, the taxonomic systems, intrinsic
metadata, and content extraction are designed to work with the little bits and pieces that you do remember, and help you
find the entire document (and all the related documents). Fragmentary pieces such as “a tax document from 2019” should
be enough to reduce the search space to something that you can brute-force. “a tax document from my employer” should be
sufficient to narrow down to the exact document or handful of documents that you need.</p>

<p>In fact, the most useful technique and mental model for this system? Get yourself in the habit of committing every
potentially useful piece of information/context to the taxonomy. The more richly you describe data when loading it into
the system, the less of it you have to remember.</p>

<h1 id="conclusion">Conclusion</h1>

<p>Whew. We’re a touch over six thousand words in, and I’ve finally reached the end. My broad conclusion here is that most
existing systems for knowledge management fall short, in fairly predictable, consistent ways. Each of these ways is
intrinsic to the “type” of system! You could almost say it’s each category’s nature to fail in one of these ways:</p>

<ol>
  <li>They’re too oriented toward “business process” integration, and don’t support the capture of evolving knowledge,
except when it can be locked into their hierarchical, discretized model. These are your enterprise-y document
management type systems (PaperPort, OnBase, DocumentCloud, etc.). They sometimes deliver on the full-text search
aspect, because businesses tend to have needs for that, and they’re usually strong on storage (OnBase I know has a
lot of flexibility here, that is genuinely useful to businesses), but taxonomic organization is not a concern for
them.</li>
  <li>They’re too oriented toward pure storage, data longevity, and archiving. This is Perkeep and similar “long-term”
archival systems. It’s not that archiving and longevity are unimportant (like I said: nobody likes to lose data), but
to me, it sort of misses the forest for the trees. I rarely want to store data merely for the sake of storing it,
instead I want to derive insights, augment my memory, or have an “auditable”<sup id="fnref:7"><a href="#fn:7" class="footnote" rel="footnote" role="doc-noteref">8</a></sup> record of some event. Maybe there
are systems out there that combine the two, but I haven’t seen one yet.</li>
  <li>They’re too oriented toward plain text, and lack support for bringing non-plain text documents into their sphere.
Wikis fall into this category, and they’re great if you’re able to fit everything you want to record in a wiki.
They’re frequently imperfect in their UX, but for the key cases of capturing evolving data and capturing operational
knowledge (about your business, about your life, etc.), they tend to be the best tool for the job.</li>
</ol>

<p>Everyone who uses existing knowledge management systems suffers for this:</p>

<ol>
  <li>Enterprises end up with a slapdash house of cards, suffused with inconsistent process, and half-used features. I know
I’ve seen enough Confluence pages with attached Excel spreadsheets, or PDFs, or Word documents, to last me a
lifetime. Meanwhile, some older documents live on SharePoint, which does a passable job with search on PowerPoints,
PDFs, etc., but butchers wikis so badly that I’m surprised there hasn’t been a class action lawsuit. They’ll never be
migrated to Confluence, because to do so would actually be a functional regression. Users of these enterprise systems
suffer beacuse finding a canonical reference involves searching at least two, if not more, disparate systems.</li>
  <li>“Regular” users suffer because they never have a system that integrates their operational knowledge with their
operational knowledge—finding, say, “all the tradesperson invoices from 2020” ranges from “a few lookups” in
the best case where you’ve already built a hierarchy around these concepts (Invoices → Tradespeople), to
“manually trawl through everything trying to remember names” if you haven’t. And even if you’ve already built the
hierarchy, you’re likely to have issues at some point: even highly organized people are likely to slip up at some
point without the benefit of a consistent, computer-aided process. Take commercial plane flight for example: their
process is highly aided by checklists and computers, and as a result, flying is the safest way to travel (per mile
traveled), and it isn’t close<sup id="fnref:8"><a href="#fn:8" class="footnote" rel="footnote" role="doc-noteref">9</a></sup>.</li>
</ol>

<p>Given the advancements in the technology (technological solutions exist for all the functionality I’ve outlined) and the
theory (I don’t believe anything I’ve said here is a completely novel approach/idea), there’s no reason that a modern
system should make so much distinction between operational (evolving) knowledge (in the form of wikis) and snapshotted
or frozen knowledge (in the form of fixed non-plain text documents). Better and easier knowledge organization would
allow people to operate more efficiently in their business lives and their personal lives. <strong>We can, and should, do
better.</strong></p>

<!-- Abbreviations, links, and footnotes after here. -->

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:0">
      <p>A <abbr title="Centralized Online Real-time Exchange">CORE</abbr> (hereafter, just “a core”) is basically a ledger and processing center for bank accounts. In our particular
  case, we’re replacing our “Deposits” core, which houses all our deposit accounts (checking, savings, etc.), and in
  an odd edge case, also houses residential mortgages and certain types of revolving/installment loans (like
  <abbr title="Home Equity Lines of Credit">HELOCs</abbr>). This is, in no uncertain terms, a big fucking deal that requires a lot of business and technology work
  and coordination to execute. The current system has been in service since the 80s, and is predictably deeply
  integrated. <a href="#fnref:0" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1">
      <p>PDF form of scanned original article, with the section titles in place. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>Text link, missing section title, but much more readable. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>The word hadn’t been coined in 1945 yet, and wouldn’t be until 1968 by Phil Bagley. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4">
      <p>The author remembers wikis starting to be used for game guides in 2000/2001, and using Wikipedia for “research” in
  high school, probably around 2004. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5">
      <p>One of these clever tricks involves the gross abuse of hardlinks. I’ve in fact implemented this once before, and
  it works the following way: each file is stored in a single central directory, that’s normally-hidden from the user.
  Directories are created that represent each tag/category as needed—even hierarchy can be represented this way (see
  the section on hierarchy in tagging). Then, each file is hard-linked into all of the hierarchy locations it belongs in
  (because things can be in multiple hierarchies/tags/categories!). From the user’s perspective, everything is in the
  right places, but we reduce the amount of painful disk space use. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6">
      <p>This can, and might frequently be, different from the creation date of the file—think documents that are
  scanned months/years after their creation as paper/physical documents. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7">
      <p>In the sense of able to be trawled through, not any auditing like certificate transparency, blockchains, etc. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8">
      <p>0.2 deaths per 10 billion passenger-miles for air flight. 150 deaths per 10 billion passenger-miles for driving. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Bojan Rajkovic</name></author><category term="Essays" /><category term="Knowledge Management" /><category term="Organization" /><category term="Mental Models" /><summary type="html"><![CDATA[In which I ruminate on what knowledge and document management looks like in the modern era.]]></summary></entry><entry><title type="html">Including collections in Jekyll archives</title><link href="https://coderinserepeat.com/2020/04/05/including-collections-in-jekyll-archives.html" rel="alternate" type="text/html" title="Including collections in Jekyll archives" /><published>2020-04-05T11:42:47-04:00</published><updated>2020-04-05T11:42:47-04:00</updated><id>https://coderinserepeat.com/2020/04/05/including-collections-in-jekyll-archives</id><content type="html" xml:base="https://coderinserepeat.com/2020/04/05/including-collections-in-jekyll-archives.html"><![CDATA[<p>Recently, I decided to upload all my recipes onto my blog, as a convenient way
to share them, any modifications I made, and the original source. It also
allows me to have a “backup” of them, as the <a href="https://github.com/bojanrajkovic/paprika-exporter">tool that I wrote</a>
also exports <a href="https://paprikaapp.com">Paprika’s</a> “importable” format (really just gzip +
JSON).</p>

<p>The best way to do this was as a Jekyll collection, which lets me neatly keep
them separate from the actual blog posts. However, I wanted my recipe
categories to be used by <code class="language-plaintext highlighter-rouge">jekyll-archives</code> as part of tag
generation. Normally, this is not supported, but Ruby monkey-patching allows
me to commit the following crime in a Jekyll plugin (appropriately called
<code class="language-plaintext highlighter-rouge">fixup-recipe-tags.rb</code>):</p>

<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span class="nb">require</span> <span class="s2">"jekyll-archives"</span>
<span class="nb">require</span> <span class="s2">"jekyll"</span>

<span class="k">module</span> <span class="nn">Jekyll</span>
    <span class="k">module</span> <span class="nn">Archives</span>
        <span class="k">class</span> <span class="nc">Archives</span>
            <span class="kp">alias_method</span> <span class="ss">:old_tags</span><span class="p">,</span> <span class="ss">:tags</span>

            <span class="k">def</span> <span class="nf">collection_tags</span><span class="p">(</span><span class="n">collection_name</span><span class="p">)</span>
                <span class="nb">hash</span> <span class="o">=</span> <span class="no">Hash</span><span class="p">.</span><span class="nf">new</span> <span class="p">{</span> <span class="o">|</span><span class="n">h</span><span class="p">,</span> <span class="n">key</span><span class="o">|</span> <span class="n">h</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span> <span class="p">}</span>
                <span class="vi">@site</span><span class="p">.</span><span class="nf">collections</span><span class="p">[</span><span class="n">collection_name</span><span class="p">].</span><span class="nf">docs</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="nb">p</span><span class="o">|</span>
                    <span class="nb">p</span><span class="p">.</span><span class="nf">data</span><span class="p">[</span><span class="s2">"tags"</span><span class="p">]</span><span class="o">&amp;</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span> <span class="nb">hash</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">&lt;&lt;</span> <span class="nb">p</span> <span class="p">}</span>
                <span class="k">end</span>
                <span class="nb">hash</span><span class="p">.</span><span class="nf">each_value</span> <span class="p">{</span> <span class="o">|</span><span class="n">posts</span><span class="o">|</span> <span class="n">posts</span><span class="p">.</span><span class="nf">sort!</span> <span class="p">}</span>
                <span class="nb">hash</span>
            <span class="k">end</span>

            <span class="k">def</span> <span class="nf">tags</span>
                <span class="n">collections_to_tag</span> <span class="o">=</span> <span class="vi">@config</span><span class="p">[</span><span class="s1">'collections'</span><span class="p">]</span>

                <span class="n">merged_tags</span> <span class="o">=</span> <span class="vi">@site</span><span class="p">.</span><span class="nf">post_attr_hash</span><span class="p">(</span><span class="s2">"tags"</span><span class="p">)</span>
                <span class="n">collections_to_tag</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">collection</span><span class="o">|</span>
                    <span class="n">merged_tags</span> <span class="o">=</span> <span class="n">merged_tags</span><span class="p">.</span><span class="nf">merge</span><span class="p">(</span><span class="n">collection_tags</span><span class="p">(</span><span class="n">collection</span><span class="p">))</span> <span class="p">{</span> <span class="o">|</span><span class="n">key</span><span class="p">,</span> <span class="n">v1</span><span class="p">,</span> <span class="n">v2</span><span class="o">|</span> <span class="p">[</span><span class="n">v1</span><span class="p">,</span><span class="n">v2</span><span class="p">].</span><span class="nf">flatten</span> <span class="p">}</span>
                <span class="p">}</span>
                <span class="n">merged_tags</span>
            <span class="k">end</span>
        <span class="k">end</span>
    <span class="k">end</span>
<span class="k">end</span></code></pre></figure>

<p>It’s probably not idiomatic Ruby, but it allows me to add the following to the
<code class="language-plaintext highlighter-rouge">jekyll-archives</code> section of <code class="language-plaintext highlighter-rouge">_config.yml</code>:</p>

<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">collections</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s">recipes</span></code></pre></figure>

<p>With that, the recipes are used for <em>tags</em>, but they’re not emitted into the
category pages or date-based archives. They’re also not emitted into my
“tagged” page, because that only works with the posts collection.</p>]]></content><author><name>Bojan Rajkovic</name></author><category term="Software Engineering" /><category term="Programming" /><category term="Jekyll" /><category term="Ruby" /><summary type="html"><![CDATA[Recently, I decided to upload all my recipes onto my blog, as a convenient way to share them, any modifications I made, and the original source. It also allows me to have a “backup” of them, as the tool that I wrote also exports Paprika’s “importable” format (really just gzip + JSON).]]></summary></entry><entry><title type="html">Making money via sloppy record keeping</title><link href="https://coderinserepeat.com/2019/10/08/making-money-via-sloppy-record-keeping.html" rel="alternate" type="text/html" title="Making money via sloppy record keeping" /><published>2019-10-08T21:27:58-04:00</published><updated>2019-10-08T21:27:58-04:00</updated><id>https://coderinserepeat.com/2019/10/08/making-money-via-sloppy-record-keeping</id><content type="html" xml:base="https://coderinserepeat.com/2019/10/08/making-money-via-sloppy-record-keeping.html"><![CDATA[<p><strong>TL;DR</strong>: We leased a car, transferred the lease to someone else within the
allowed period, GM Financial due to their sloppy record keeping, continued to
think that we still held the lease and sent us a $1200 bill that I had to call
them to get it cancelled. My thesis: this sloppy record keeping probably
generates a little bit of revenue—if the bill is small enough, some folks
will just pay it instead of contesting it.</p>

<p>Back in August 2016, we had earlier that year plunked down the $1000 to hold our
spot in line for a Tesla Model 3, and figured we could drive our stoic old Camry
until it hit the ground—we expected that would be at least a few years,
since it had been running without any issues up until then.</p>

<p>Then, <a href="/assets/images/camry_broadside.jpg">this happened</a>. I was driving to work, with a trunkful of monitors
and Hue doodads that a friend was buying off me, and got broadsided by an
unlicensed driver who was driving a rental car they did not have permission to
drive. The scene was sufficiently confusing to the responding officers (in a
stroke of luck, this was down the street from Boston police headquarters) that
they asked me to stick around just in case they needed to arrest the other
driver (they did not).</p>

<p>Our venerable Blueberry unfortunately did not make it out—the damage was
so severe that a repair would have cost nearly $12,000, and that was using the
cheapest parts that the adjuster could find on the market (scrapyard parts,
etc.). The true cost likely would have been closer to $15,000, and our
insurance, rightfully, was not going to pay that. They paid us out the ~$7,000
value of the car, and went off to recover from the other involved parties. Part
of me wishes we were still with them so I could ask how that all went—I
imagine the attorneys earned their hourlies on that one.</p>

<p>We needed a new car PDQ—we were able to borrow from my parents for a few
weeks, but we needed a long term ride. We tested a few things, did some
research, and ended up with a Chevy Volt. We were big fans of most things about
the car and ended up signing up for a 3 year lease, with a great rate thanks to
some shrewd negotiating (read: we walked out on the original bullshit rate). We
figured this would be a great way to ease into electric vehicles while having
the ready backup of gas if we needed it. We signed for a 3 year lease and
figured we would cross the bridge of getting rid of the lease when that time came.</p>

<p>That time came in July of 2018 when we priced out and paid for our Model 3.
Luckily, we found someone to take over the lease, got everything done in record
time, and got the lease assumption in under two wires:</p>

<ol>
  <li>We were leaving the country for a 3 week vacation</li>
  <li>We were approaching &lt;12 months left on the lease, at which point it could
not be transferred anymore.</li>
</ol>

<p>I continued to get mail from GM Financial/the dealer we leased through, but
chalked it off as marketing. The bills were getting paid by the new lessee, and
everyone was happy.</p>

<p>The lease ended about a month ago, and the new lessee returned the
car—they were told that there would not be any additional fees for excess
wear and tear/mileage, etc., and relayed that information to me as a courtesy.
Imagine my surprise when last week, I receive a letter from GM thanking me for
returning my vehicle, and asking me to pay $1200 in disposition fees, excess
wear and tear fees, property taxes, and “other fees.” After a quick
back-and-forth with the new lessee, I decided to call up GM Financial.</p>

<p>The good news is, once I got them on the phone, they were very pleasant! I was
put on hold briefly while the Lease End specialist talked to the Lease
Assumption specialist, and they came back with a “please disregard that bill, we
see here that the lease was assumed.”</p>

<p>This got me thinking, though: <strong>how often does this happen for smaller dollar
amounts, and the person recieving the bill does not contest it?</strong></p>

<p>I suspect it’s not a lot—after all, the lease disposition fee was around
$350, and the only <em>small</em> amount on the listing was the “other fees and taxes,”
which dialed in around $50.</p>

<p>However, if this sort of sloppy record keeping is the standard, they must be
sending out a ton of these letters. How many need to “hit” and be paid without
question in order for the sloppy record to become a pathological target for the
department to hit, because it makes money. Otherwise, what’s the incentive to
not fix this? It certainly costs money when I call in to support (GM Financial’s
help line has always been staffed by American staff, so they’re not paying low
offshore costs), so why not fix the process so that they don’t ever have to deal
with this again?</p>]]></content><author><name>Bojan Rajkovic</name></author><category term="Diary" /><category term="Money" /><category term="Sloppiness" /><summary type="html"><![CDATA[TL;DR: We leased a car, transferred the lease to someone else within the allowed period, GM Financial due to their sloppy record keeping, continued to think that we still held the lease and sent us a $1200 bill that I had to call them to get it cancelled. My thesis: this sloppy record keeping probably generates a little bit of revenue—if the bill is small enough, some folks will just pay it instead of contesting it.]]></summary></entry><entry><title type="html">Mounting old Synology volumes in new hardware</title><link href="https://coderinserepeat.com/2017/05/13/mounting-existing-synology-volumes.html" rel="alternate" type="text/html" title="Mounting old Synology volumes in new hardware" /><published>2017-05-13T00:00:00-04:00</published><updated>2017-05-13T00:00:00-04:00</updated><id>https://coderinserepeat.com/2017/05/13/mounting-existing-synology-volumes</id><content type="html" xml:base="https://coderinserepeat.com/2017/05/13/mounting-existing-synology-volumes.html"><![CDATA[<p>I recently upgraded my Synology NAS from a DS 214play that I’ve had for a few
years to a DS 1515+, and bought two additional drives to go along with it. I
also wanted to do a fresh start of the configuration and metadata, as I had been
having some issues with my existing NAS (in addition to the performance issues
that drove me to upgrade), so I did not want to just move the existing drives
over and add the new ones to the same array.</p>

<p>I set up the 1515+, and started copying files over using SFTP mounts–however, I
was getting abysmal speeds, 10-11 MB/s at best. Both pieces of hardware were
connected to a gigabit network, neither was doing anything else at the time, but
transfers were incredibly slow. Not wanting to wait a few days to transfer 3 TB
of data, I set out to find a better way to transfer.</p>

<p>I knew Synology’s “Hybrid RAID” was just a Linux software RAID, which meant I
should be able to mount it in the new Synology as well. I started by doing some
exploring with mdadm, checking that the array was not degraded for some reason,
etc. However, I couldn’t simply assemble it and mount it—under the
software RAID is an LVM volume group. I started by dumping some state about the
volume groups:</p>

<noscript><pre>root@Hagal:/mnt# vgdisplay
  WARNING: Duplicate VG name vg1000: Existing zkpaaW-zNCs-xB1u-E0nW-afok-Up0N-uAhtB7 (created here) takes precedence over ijfPFm-3l2P-55UC-                            YzAt-Ps3h-K45B-TdTLHT
  WARNING: Duplicate VG name vg1000: Existing zkpaaW-zNCs-xB1u-E0nW-afok-Up0N-uAhtB7 (created here) takes precedence over ijfPFm-3l2P-55UC-                            YzAt-Ps3h-K45B-TdTLHT
  WARNING: Duplicate VG name vg1000: zkpaaW-zNCs-xB1u-E0nW-afok-Up0N-uAhtB7 (created here) takes precedence over ijfPFm-3l2P-55UC-YzAt-Ps3h                            -K45B-TdTLHT
  --- Volume group ---
  VG Name               vg1000
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               3.63 TiB
  PE Size               4.00 MiB
  Total PE              952682
  Alloc PE / Size       952682 / 3.63 TiB
  Free  PE / Size       0 / 0
  VG UUID               ijfPFm-3l2P-55UC-YzAt-Ps3h-K45B-TdTLHT

  WARNING: Duplicate VG name vg1000: zkpaaW-zNCs-xB1u-E0nW-afok-Up0N-uAhtB7 (created here) takes precedence over ijfPFm-3l2P-55UC-YzAt-Ps3h                            -K45B-TdTLHT
  --- Volume group ---
  VG Name               vg1000
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               3.63 TiB
  PE Size               4.00 MiB
  Total PE              952682
  Alloc PE / Size       952682 / 3.63 TiB
  Free  PE / Size       0 / 0
  VG UUID               zkpaaW-zNCs-xB1u-E0nW-afok-Up0N-uAhtB7</pre></noscript>
<script src="https://gist.github.com/bojanrajkovic/a03f947781b809b5848feb1171f51b7c.js?file=vgdisplay.log"> </script>

<p>Aha! <code class="language-plaintext highlighter-rouge">vgdisplay</code> is telling me what I want to know already: I have a duplicate
volume group, and the existing one that was created here (the new NAS’s VG) is
taking precedence over the old one. Armed with the UUID there, I can rename the
old VG:</p>

<noscript><pre>root@Hagal:/mnt# lvm vgrename ijfPFm-3l2P-55UC-YzAt-Ps3h-K45B-TdTLHT vg1001
  WARNING: Duplicate VG name vg1000: Existing zkpaaW-zNCs-xB1u-E0nW-afok-Up0N-uAhtB7 (created here) takes precedence over ijfPFm-3l2P-55UC-YzAt-Ps3h-K45B-TdTLHT
  WARNING: Duplicate VG name vg1000: Existing zkpaaW-zNCs-xB1u-E0nW-afok-Up0N-uAhtB7 (created here) takes precedence over ijfPFm-3l2P-55UC-YzAt-Ps3h-K45B-TdTLHT
  WARNING: Duplicate VG name vg1000: zkpaaW-zNCs-xB1u-E0nW-afok-Up0N-uAhtB7 (created here) takes precedence over ijfPFm-3l2P-55UC-YzAt-Ps3h-K45B-TdTLHT
  Volume group &quot;vg1000&quot; successfully renamed to &quot;vg1001&quot;</pre></noscript>
<script src="https://gist.github.com/bojanrajkovic/a03f947781b809b5848feb1171f51b7c.js?file=vgrename.log"> </script>

<p>Once it’s been renamed, the next step is to activate the VG so that it gets a
/dev entry and becomes mountable:</p>

<noscript><pre>root@Hagal:/mnt# lvm vgchange -a y vg1001
  1 logical volume(s) in volume group &quot;vg1001&quot; now active</pre></noscript>
<script src="https://gist.github.com/bojanrajkovic/a03f947781b809b5848feb1171f51b7c.js?file=vgchange.log"> </script>

<p>Once we’ve activated it, we can mount it via its /dev entry, and we can see our
entire main storage volume there:</p>

<noscript><pre>root@Hagal:/mnt# mount /dev/vg1001/lv test/
root@Hagal:/mnt# ls test/
@appstore     @autoupdate  camera-upload  comix      downloads  homes           logs        music   Plex          @tmp   videos
aquota.group  backups      @cloudstation  @database  @eaDir     @img_bkp_cache  lost+found  photo   @S2S          tv     web
aquota.user   books        @cloudsync     @download  games      lightroom       movies      photos  synoquota.db  video
</pre></noscript>
<script src="https://gist.github.com/bojanrajkovic/a03f947781b809b5848feb1171f51b7c.js?file=mount.log"> </script>

<p>Once the volume group was mounted, I could copy files much faster than copying
over the network allowed me—100+ MB/s vs. 10-11.</p>

<p>Important notes:</p>

<ul>
  <li>None of these operations should cause data loss, but I am not responsible for
any data loss that may occur if you follow my instructions!</li>
  <li>Be careful when copying the UUID for a rename.</li>
  <li>More complex Synology setups may not work this easily—I did everything
assuming you set up a single volume group, all the drives are in the same RAID
array, etc.
    <ul>
      <li>That said, you should be able to use these same tools on more complex
setups, just with more care taken to find the right volume groups.</li>
    </ul>
  </li>
  <li>I had to reboot to get the drives back into a state where I could erase them
and add them to an existing volume.</li>
  <li>I would suggest <strong>not</strong> rebooting your Synology with the old drives plugged
in—it is likely to pick up the old volume as a new volume and re-shuffle
your volumes and shared folders.</li>
  <li>These instructions should work on anything that supports LVM/mdraid and the
filesystem on the drives (ext4 or btrfs).</li>
</ul>

<p>Good luck!</p>]]></content><author><name>Bojan Rajkovic</name></author><category term="Home Lab" /><category term="Synology" /><category term="Linux" /><summary type="html"><![CDATA[I recently upgraded my Synology NAS from a DS 214play that I’ve had for a few years to a DS 1515+, and bought two additional drives to go along with it. I also wanted to do a fresh start of the configuration and metadata, as I had been having some issues with my existing NAS (in addition to the performance issues that drove me to upgrade), so I did not want to just move the existing drives over and add the new ones to the same array.]]></summary></entry><entry><title type="html">Bringing Rust to C#: Oxide and Oxide.Http</title><link href="https://coderinserepeat.com/2017/05/05/oxide-and-oxide-http.html" rel="alternate" type="text/html" title="Bringing Rust to C#: Oxide and Oxide.Http" /><published>2017-05-05T00:00:00-04:00</published><updated>2017-05-05T00:00:00-04:00</updated><id>https://coderinserepeat.com/2017/05/05/oxide-and-oxide-http</id><content type="html" xml:base="https://coderinserepeat.com/2017/05/05/oxide-and-oxide-http.html"><![CDATA[<p>Rust is a language I’ve admired for a long time now, from a slight distance.
I’ve read about the borrow checker, perused the standard crates, and read up on
Cargo and the way that Rust applications and libraries are built, tested, and
shipped. I appreciate its striving to be a systems-level language that also
cares about safety and developer productivity.</p>

<p>I haven’t <em>written</em> as much Rust as I’d like to (though I did start a few small
projects here and there), but that didn’t stop me from thinking that maybe some
of its standard library features have a place in the C# world. I found myself
particularly fond of the <a href="https://doc.rust-elang.org/std/option/enum.Option.html">Option</a> and <a href="https://doc.rust-lang.org/std/result/enum.Result.html">Result</a> types, and
their ability to better the flow of my code. Option’s API is <code class="language-plaintext highlighter-rouge">Nullable&lt;T&gt;</code> on
steroids, and Result provides an elegant way to express an error that doesn’t
require using out parameters or custom exceptions, while at the same time
providing a delightful API that lets you build processing pipelines that
preserve errors and lazily evaluate steps.</p>

<p>The adventure started when one afternoon about a month ago, when I decided I
wanted to see if I could implement <code class="language-plaintext highlighter-rouge">Option</code> in C#. I knew I wanted to preserve
as much of the Rust API as made sense, including the simple construction of
<code class="language-plaintext highlighter-rouge">Some</code> and <code class="language-plaintext highlighter-rouge">None</code> as function calls: <code class="language-plaintext highlighter-rouge">Some(5)</code>, <code class="language-plaintext highlighter-rouge">None&lt;int&gt;()</code>, etc. I opened
up <a href="https://developer.xamarin.com/guides/cross-platform/workbooks">Workbooks</a> (use what you know, right?) and started hacking away.
After a little while, I had my first pass at the <code class="language-plaintext highlighter-rouge">Option</code> API surface—I
stuffed it in a <a href="https://gist.github.com/bojanrajkovic/b1ff4d52fccffcf7e6e98aa041b52ee7">Gist</a>, Within a few hours, I decided to make it into a
library called <a href="https://github.com/bojanrajkovic/Oxide">Oxide</a>—after all, what else is Rust?</p>

<p>My <a href="https://gist.github.com/bojanrajkovic/b1ff4d52fccffcf7e6e98aa041b52ee7">initial commit</a> brought in the API almost exactly as it was in the
Gist. Over the rest of the day, I refined the API slightly, added a ton of tests
(inspired by the Rust documentation’s example assertions), and wrapped up. A
week later, I decided to add <code class="language-plaintext highlighter-rouge">Result</code>, which I implemented largely the same way
(a base class, with derived <code class="language-plaintext highlighter-rouge">Ok&lt;T, E&gt;</code> and <code class="language-plaintext highlighter-rouge">Err&lt;T, E&gt;</code> classes).</p>

<p>Since then I’ve refined the API for both, added a priority queue implementation,
added a small library of HTTP helper methods (Oxide.Http), received my first
external contribution from <a href="https://twitter.com/jeremie_laval">Jérémie Laval</a> who contributed a very nice
set of convenience methods to enable async/await with Option, and finally
published a NuGet (when I was forced to by wanting to use Oxide in another
project but wanted to avoid submodules).</p>

<p>I hope to keep working on Oxide—there will probably be more APIs that I
would like to borrow from Rust, or more functional/Rust-inspired API that would
be useful for C# developers. Contributions of all kinds are welcome: bug
reports, feature requests, documentation, etc. You can find
Oxide <a href="https://github.com/bojanrajkovic/Oxide">on GitHub</a>—please use the issue tracker there for
everything. :)</p>]]></content><author><name>Bojan Rajkovic</name></author><category term="Software Engineering" /><category term="Rust" /><category term="Functional Programming" /><category term="C#" /><category term="Oxide" /><summary type="html"><![CDATA[Rust is a language I’ve admired for a long time now, from a slight distance. I’ve read about the borrow checker, perused the standard crates, and read up on Cargo and the way that Rust applications and libraries are built, tested, and shipped. I appreciate its striving to be a systems-level language that also cares about safety and developer productivity.]]></summary></entry><entry><title type="html">iOS 10, CPBitmap, and you</title><link href="https://coderinserepeat.com/2016/12/22/ios-10-cpbitmap-and-you.html" rel="alternate" type="text/html" title="iOS 10, CPBitmap, and you" /><published>2016-12-22T00:00:00-05:00</published><updated>2016-12-22T00:00:00-05:00</updated><id>https://coderinserepeat.com/2016/12/22/ios-10-cpbitmap-and-you</id><content type="html" xml:base="https://coderinserepeat.com/2016/12/22/ios-10-cpbitmap-and-you.html"><![CDATA[<blockquote>
  <p>Editor’s note: this is an old post that I’ve published now. I’ve since
found that CPBitmap files do contain a binary plist at the end, but it
was not in the exact location described by most blog posts. I’ve got a
bit of code written, but I’m not 100% happy with it yet, so it hasn’t
been published!</p>
</blockquote>

<p>For a long while, I’ve been using a <a href="https://coderinserepeat-my.sharepoint.com/personal/brajkovic_coderinserepeat_com/_layouts/15/guestaccess.aspx?guestaccesstoken=xCUeqwPCt%2fqm%2fBRMdOCqLI2O2KkeQIugtq713H7oth0%3d&amp;docid=052323597d7204437a620ee5157f93392&amp;rev=1">photo of my cat Zooey</a> as my
iPhone’s background image. Recently, I wanted to replace it with a different
one, but the picture of Zooey wasn’t anywhere on my phone. iOS doesn’t come with
a way to save the background picture, but I figured it couldn’t be <strong>that</strong>
difficult. It had to be somewhere on the phone, or in a backup, in some
reasonable format—after all, my phone has to display it!</p>

<p>My first step was to start looking for where the file is on iOS—turns out
it’s in <code class="language-plaintext highlighter-rouge">/var/mobile/Library/Springboard/LockBackground.cpbitmap</code> by default. If
your phone is jailbroken, there are tools you can use to access the file, but my
phone is not, so that was right out.</p>

<p>Luckily, with an unencrypted iTunes backup (iTunes backups preserve background
images!), and a handy tool called <a href="https://www.macroplant.com/iexplorer/">iExplorer</a>, I was able to find and
extract <code class="language-plaintext highlighter-rouge">LockBackground.cpbitmap</code>. With this in hand, I set out to find what the
format was, so that I could retrieve my image of Zooey.</p>

<p>The first thing I ran into was a reference to a converter service that someone
had published years ago at http://cpbitmap.cleverbyte.com.au/. This is no longer
up, but the same person had published the code to a <a href="http://www.codeproject.com/script/Articles/ArticleVersion.aspx?aid=265333&amp;av=393837">CodeProject article</a>. I
downloaded the code, fired up Visual Studio, ran it, and attempted to run it on
my file. It crashed, and the file format didn’t seem to match at all.</p>

<p>The next thing I found was many variations on a Python script that used
the <a href="http://www.pythonware.com/products/pil/">Python Image Library</a> to extract the image, after skipping what the
script claimed to be a binary plist header. None of these worked
either—they almost all crashed after producing nonsensical image sizes
(they were reporting image sizes 40-60k pixels per side, iPhone 7 background
images are 750x1334).</p>

<p>After this, I started to look at the raw data itself, hoping to divine some
patterns. The first thing I saw was that the file was not any container format.
There was no magic number at the beginning—it was not a BMP, PNG, JPG,
TIFF, binary plist, or anything that I or <code class="language-plaintext highlighter-rouge">file(1)</code> recognized. I started
wondering if maybe this was not raw RGB data—in retrospect, I should have
thought of this earlier: iOS would prefer to blit this file to GPU memory as
fast as possible, and decoding a graphics format would just be a waste of time.</p>

<p>After some playing around with our <a href="https://developer.xamarin.com/guides/cross-platform/workbooks/">Workbooks</a> product, I discovered that
what I had on my hands <strong>was</strong> RGB data—BGRA32 data, to be precise. Yet
when I created images from it, they were…wrong. You can see the broken
image <a href="https://coderinserepeat-my.sharepoint.com/personal/brajkovic_coderinserepeat_com/_layouts/15/guestaccess.aspx?guestaccesstoken=NkoyPyzYJhBi4NB1NYa0MbM059rQtQbNcBKMUWk8uRk%3d&amp;docid=0e64ced5c4ed1462ea5565ba78d88d17c&amp;rev=1">here</a>—it’s immediately obvious there’s some sort of
“misread” in the pixels.</p>

<p>I’m not a seasoned graphics pro, so it wasn’t immediately obvious to me, but
I <a href="https://twitter.com/bojanrajkovic/status/800420346180542466">tweeted about it</a> and almost immediately got a message
from <a href="https://twitter.com/lewing">Larry</a> that my issue was likely a row stride mismatch somewhere
(shortly followed by another Larry delivering the same message
via <a href="https://twitter.com/lobrien/status/800422504279871488">tweet</a>). After some discussion, Larry Ewing suggested that the
image might be 8-byte aligned w/ some 0-padding for easy blitting to GPU/SIMD. I
had been using a stride of 3000 (4*750)—I adjusted it to 3008 (the next
multiple of 8), and got the correct image!</p>

<p>A sharp observer might point out now that the image was already 8-byte aligned
before—after all, 750*4 is 375*8. My guess is that they’re padded not only
for alignment purposes, but also because iOS may not always be storing 750-pixel
wide images here. There may be a case where Apple is using the padding to both
indicate the end of a row and to pad it for easy manipulation, with no visible
changes (the extra 2 pixels won’t show up on screen).</p>

<p>I’m hoping to throw together a little bit of publishable code to decode known
CPBitmap formats into something useful, so I would love to get my hands on more
samples of CPBitmap files—it would be interesting to see if/how the format
has changed with iOS version. If you happen to have an older iOS version
installed and can dump the files, please upload them somewhere and send me the
link!</p>]]></content><author><name>Bojan Rajkovic</name></author><category term="Random Hacks" /><category term="iOS" /><category term="CPBitmap" /><category term="Workbooks" /><summary type="html"><![CDATA[Editor’s note: this is an old post that I’ve published now. I’ve since found that CPBitmap files do contain a binary plist at the end, but it was not in the exact location described by most blog posts. I’ve got a bit of code written, but I’m not 100% happy with it yet, so it hasn’t been published!]]></summary></entry><entry><title type="html">Security Recipes in X</title><link href="https://coderinserepeat.com/2016/07/04/security-recipes-in-x.html" rel="alternate" type="text/html" title="Security Recipes in X" /><published>2016-07-04T20:11:52-04:00</published><updated>2016-07-04T20:11:52-04:00</updated><id>https://coderinserepeat.com/2016/07/04/security-recipes-in-x</id><content type="html" xml:base="https://coderinserepeat.com/2016/07/04/security-recipes-in-x.html"><![CDATA[<p>A while ago, <a href="https://twitter.com/blowdart">Barry Dorans</a> (who works on ASP.NET security at
Microsoft) tweeted that he was working with the Roslyn team to build security
analyzers. In particular, security analyzers based on commonly seen mistakes on
Stack Overflow, for example:</p>

<ul>
  <li>ServerCertificateValidationCallback always returning true</li>
  <li>Cut-and-paste AES crypto with a deterministic IV</li>
</ul>

<p>This
<a href="https://twitter.com/bojanrajkovic/status/738147338657484800">got me thinking that what would be great is a collection of articles/code/blog posts</a> that
helps developers of all walks make good choices with regards to implementing
security primitives. Starting from the basics (teaching how to hash, etc.), to
symmetric cryptography, to asymmetric cryptography, TLS, etc. This could be a
resource that could be linked to from Stack Overflow, referenced on Twitter, or
used as a teaching tool.</p>

<p>To that end, I started exactly such a thing today! I’m structuring it roughly as
a book right now, with chapters covering broad concepts (for example, chapter 1
is “Hashing” right now), and sections within that chapter covering subtopics (so
far, I’ve only written one section, on using hashes to verify file content
integrity).</p>

<p>The idea is to write in a conversational, approachable style, and provide each
topic in digestible chunks, without going into excruciating detail about
implementations. Developers <strong>do</strong> need to know which algorithms are
recommended, but do <strong>not</strong> need to know about S-boxes, hash rounds, XOR shifts,
and other details of how the algorithms are implemented.</p>

<p>I would like to eventually provide implementations in multiple languages. I
started with C# because it’s what I’m most natural with, but eventually it would
be great to have Ruby, Rust, C, Go, Javascript, and others represented.</p>

<p>I’ve created a <a href="https://github.com/bojanrajkovic/security-recipes-in-x">GitHub repository</a> that has what I’ve done so far. The
code is licensed under the MIT license, and non-code pieces (ie. the prose that
constitutes each chapter) are CC-BY-NC-SA 4.0. Any contributions are
welcome—new languages, mistakes I’ve made (either in code or in prose),
etc.</p>

<p>I’m still refining how the prose and code are structured. I like what I’ve done
so far, with each section having code inline and a separate file containing all
the code without the prose for easy digestibility, but it may not make as much
sense to do that for languages that have better inline Markdown/code features
(Jupyter notebooks, etc.).</p>

<p>Comments? Suggestions? Complaints? Find me on Twitter, or open up a GitHub issue
on the repo!</p>]]></content><author><name>Bojan Rajkovic</name></author><category term="Security" /><category term="Security" /><category term="C#" /><category term="Open Source" /><summary type="html"><![CDATA[A while ago, Barry Dorans (who works on ASP.NET security at Microsoft) tweeted that he was working with the Roslyn team to build security analyzers. In particular, security analyzers based on commonly seen mistakes on Stack Overflow, for example:]]></summary></entry><entry><title type="html">Writing Hubot scripts using ES6+</title><link href="https://coderinserepeat.com/2016/02/15/writing-hubot-scripts-using-es2015.html" rel="alternate" type="text/html" title="Writing Hubot scripts using ES6+" /><published>2016-02-15T03:22:22-05:00</published><updated>2016-02-15T03:22:22-05:00</updated><id>https://coderinserepeat.com/2016/02/15/writing-hubot-scripts-using-es2015</id><content type="html" xml:base="https://coderinserepeat.com/2016/02/15/writing-hubot-scripts-using-es2015.html"><![CDATA[<p>Since I discovered it shortly after moving into the world of Hipchat (and later
Slack) from the world of IRC, <a href="https://hubot.github.com">Hubot</a> has been one of my favorite tools
to make my life better. I’ve always enjoyed ChatOps, from the early days when we
simply called it “writing eggdrop scripts,” and Hubot brought ChatOps into the
modern age with its infinite flexibility and common platform that everyone could
build on top of.</p>

<p>Hubot itself is written in CoffeeScript, and traditionally, most scripts have
also been written in CoffeeScript. Unfortunately, I don’t like CoffeeScript
much—I’ve always found it to be an ill-fitting crutch for Ruby developers
who didn’t want to learn JavaScript, lest they cut themselves on the sharp edges
of braces. Meanwhile, ES6 (ES2015, really, but I’m set in my ways…) has
brought some really nice things to JavaScript development. I’m not going to list
any here, but take a look at kangax’s <a href="https://kangax.github.io/compat-table/es6/">ES6 compatibility table</a>
for an exhaustive list of everything ES6 brings to the table.</p>

<p>I’ve been slowly converting the scripts we use internally at <a href="https://xamarin.com">Xamarin</a>
to at least be written in JavaScript, if not ES6—there hasn’t really been
any good guidance on how to plug ES6 scripts into Hubot until recently, and what
there was seemed like it was only half of the story. Today I sat down and
figured out what needed to be done to make ES6 scripts automatically work.</p>

<h3 id="step-1-install-a-few-packages">Step 1: Install a few packages</h3>

<p>Install <a href="https://babeljs.io/docs/usage/require/"><code class="language-plaintext highlighter-rouge">babel-register</code></a>, <a href="https://babeljs.io/docs/plugins/preset-es2015/"><code class="language-plaintext highlighter-rouge">babel-preset-es2015</code></a>,
and <a href="https://github.com/59naga/babel-plugin-add-module-exports"><code class="language-plaintext highlighter-rouge">babel-plugin-add-module-exports</code></a>. Visit the respective module
sites to learn more about them, but the short story is that the first two will
make sure Babel works, and the last package makes sure Babel exports
CommonJS-style defaults so that simple <code class="language-plaintext highlighter-rouge">require</code> calls will work.</p>

<h3 id="step-2-create-a-babelrc">Step 2: Create a .babelrc</h3>

<p>At the top-level of your Hubot repo, create a <code class="language-plaintext highlighter-rouge">.babelrc</code> file with the following contents:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"presets"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="s2">"es2015"</span><span class="w"> </span><span class="p">],</span><span class="w">
  </span><span class="nl">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="s2">"add-module-exports"</span><span class="w"> </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>This will enable the ES6 preset and the <code class="language-plaintext highlighter-rouge">module.exports</code> plugin you installed
earlier.</p>

<h3 id="step-3-make-sure-babel-gets-loaded-early">Step 3: Make sure Babel gets loaded early</h3>

<p>Create a script that will be always be loaded first—I chose to name mine
<code class="language-plaintext highlighter-rouge">000-import-es6.js</code>. You can also make this a CoffeeScript script if you’d like,
but I stuck with plain old JavaScript. The contents should look like this:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">babel-register</span><span class="dl">"</span><span class="p">);</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span> <span class="o">=</span> <span class="kd">function</span> <span class="nf">es6</span><span class="p">(</span><span class="nx">robot</span><span class="p">)</span> <span class="p">{};</span>
</code></pre></div></div>

<p>The function export is not, in fact, required—it just makes Hubot shut up
about expecting a function but receiving an object when checking
<code class="language-plaintext highlighter-rouge">module.exports</code>.</p>

<h3 id="step-4-profit">Step 4: Profit!</h3>

<p>You can now write scripts using ES6—all of the features are available to
you to use. Put your scripts in the standard location for Hubot and they will
happily be loaded and compiled at runtime—you’ll still get correct line
numbers in stack traces though, for which I am infinitely thankful.</p>

<h3 id="for-module-authors">For Module Authors</h3>

<p>If you’re authoring a Hubot module <em>outside</em> of your Hubot source tree, the
process is almost exactly the same—at step 3, instead of creating a
<code class="language-plaintext highlighter-rouge">000-import-es6.js</code> file, you can create an <code class="language-plaintext highlighter-rouge">index.js</code> in the root of your
package, with contents similar to this:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">babel-register</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">realDefault</span> <span class="o">=</span> <span class="nf">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">./src/foo</span><span class="dl">"</span><span class="p">);</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span> <span class="o">=</span> <span class="nx">realDefault</span><span class="p">;</span>
</code></pre></div></div>

<p>A possible alternate solution is to require <code class="language-plaintext highlighter-rouge">babel-register</code>, then export a
function that uses Hubot’s <code class="language-plaintext highlighter-rouge">robot.loadFile</code> method to load your actual script
entry point—I haven’t tried this, so I don’t know how well it would work,
but I suspect it would be just fine.</p>]]></content><author><name>Bojan Rajkovic</name></author><category term="Software Engineering" /><category term="Hubot" /><category term="ES6" /><category term="ES2015" /><category term="Babel" /><category term="JavaScript" /><summary type="html"><![CDATA[Since I discovered it shortly after moving into the world of Hipchat (and later Slack) from the world of IRC, Hubot has been one of my favorite tools to make my life better. I’ve always enjoyed ChatOps, from the early days when we simply called it “writing eggdrop scripts,” and Hubot brought ChatOps into the modern age with its infinite flexibility and common platform that everyone could build on top of.]]></summary></entry></feed>