<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Have you tried restarting?]]></title><description><![CDATA[My name is Nicola, i'm an Old School Developer and passionate architect, focused on Cloud Computing and Serverless solutions.]]></description><link>https://haveyoutriedrestarting.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1749471276129/b98d27f7-40d6-453f-8a99-0c9f158557b5.png</url><title>Have you tried restarting?</title><link>https://haveyoutriedrestarting.com</link></image><generator>RSS for Node</generator><lastBuildDate>Sun, 12 Apr 2026 12:58:30 GMT</lastBuildDate><atom:link href="https://haveyoutriedrestarting.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Spec-Driven Prototyping with Amazon Q and Q-Vibes Memory Banking framework]]></title><description><![CDATA[From ideas to prototypes
We all love when an idea hits — sharp, exciting, half-formed. But getting from that spark to something tangible often involves friction: scaffolding, repetition, boilerplate.
That overhead can kill momentum.
Prototyping is ho...]]></description><link>https://haveyoutriedrestarting.com/spec-driven-prototyping-with-amazon-q-and-q-vibes-memory-banking-framework</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/spec-driven-prototyping-with-amazon-q-and-q-vibes-memory-banking-framework</guid><category><![CDATA[amazon Q developer CLI ]]></category><category><![CDATA[Amazon Web Services]]></category><category><![CDATA[generative ai]]></category><category><![CDATA[coding]]></category><category><![CDATA[prototyping]]></category><category><![CDATA[aitools]]></category><category><![CDATA[agentic AI]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Mon, 21 Jul 2025 06:00:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1752502402311/43ef8f9d-aec1-49a9-aa64-669e714c40ff.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-from-ideas-to-prototypes">From ideas to prototypes</h1>
<p>We all love when an idea hits — sharp, exciting, half-formed. But getting from that spark to something tangible often involves friction: scaffolding, repetition, boilerplate.</p>
<p>That overhead can kill <strong>momentum</strong>.</p>
<p>Prototyping is how we protect the idea. It’s the creative phase where we validate assumptions, test viability, and explore possibilities — quickly.</p>
<p>When done well, prototyping accelerates <strong>innovation</strong>.</p>
<p>Prototyping is a preliminary phase of product development and has its own objectives and constraints that differ from those of other phases of product development.</p>
<p>In prototyping, you don't yet have a clear idea of the end product. You don't even know if the idea is really good enough to become a product.</p>
<p>For this reason, we can define these attributes/constraints for prototypes:</p>
<ul>
<li><p>They should be <strong>cheap</strong>. Their realisation should take little money and time, therefore they should be developed by one or two people and not by a whole product team. You must not invest a lot of time in requirements gathering, design, development, etc.</p>
</li>
<li><p>You don't have to share it with a global audience: You can present it to investors, but while you're driving the presentation. A prototype is neither a demo nor a product preview, so you don't have to make it available to others. This means you <strong>don't have to deploy it,</strong> just run it in a closed environment (maybe your local environment).</p>
</li>
<li><p>Perhaps neither <strong>real data nor real integrations are needed</strong> if the idea can be explored without them.</p>
</li>
<li><p>It's a <strong>throwaway</strong>: you don't need to extend it to production development. Once the idea has been explored and validated, throw away your prototype and start defining and developing your product.</p>
</li>
</ul>
<p>With generative AI tools such as Amazon Q, Claude or Cursor, prototyping is faster than ever before. This is where vibe coding comes into play.</p>
<p>Some might think that vibe coding just means that the AI generates random code that you don't understand and that it's not real engineering: that may be true, a drum kit can also just be used to make noise (as my neighbour says).</p>
<p>Nowadays, vibe coding is often presented as in contast with spec-driven development. In my humble opinion, we could take advantage of vibe coding by directing agents providing them <strong>just enough specs for the goal.</strong></p>
<p>As an engineer, I don't need perfect code when prototyping, but I don't want random code either.</p>
<h1 id="heading-context-matters">Context matters</h1>
<p>Generative AI is deeply contextual — every response depends on what came before. Even small shifts in input can produce wildly different output. That’s both a strength and a weakness.</p>
<p>When you're coding by vibe, the AI doesn’t truly "know" your intent — it infers it. Without clear, consistent context, things go off-track fast:</p>
<ul>
<li><p>You change a prompt slightly, and the AI drops half the logic.</p>
</li>
<li><p>A missed constraint (like region or tech stack) leads to subtle regressions.</p>
</li>
<li><p>A well-intentioned refactor undoes previous alignment.</p>
</li>
</ul>
<p>Context matters not just in what you say, but <em>what the AI sees every time</em>. That’s why a reusable, declarative context is so powerful: it's about creating a shared space where your intent lives.</p>
<p>Structure it, store it, and tell the AI how to handle it — and you’ve got <strong>spec-driven development.</strong></p>
<h1 id="heading-the-memory-problem">The memory problem</h1>
<p>LLMs are brilliant, but forgetful. Especially across sessions or as prompts grow. When prototyping, this causes friction:</p>
<ul>
<li><p>Goals get redefined without you noticing</p>
</li>
<li><p>Tech stack or constraints subtly change</p>
</li>
<li><p>Errors repeat because nothing was "remembered"</p>
</li>
</ul>
<p>The deeper the session, the worse the drift. At some point, you’re no longer prototyping — you’re re-aligning.</p>
<p>This is where <strong>spec-driven prototyping</strong> enters.</p>
<p>Writing a simple spec, just a lightweight set of goals, guardrails, and preferred stack — helps anchor the AI’s responses. It makes collaboration reproducible.</p>
<p>Combine that with a way to update and feed this spec consistently to your assistant and you’ve got <strong>memory banking</strong>.</p>
<h1 id="heading-introducing-q-vibes-memory-banking-framework">Introducing Q-Vibes memory banking framework</h1>
<p>In my <a target="_blank" href="https://haveyoutriedrestarting.com/building-think-o-matic-a-vibe-coding-journey-with-amazon-q">last article</a>, I reported on my direct experience of building my Think-O-Matic prototype with Amazon Q, gave a definition of vibe coding and briefly introduced the concept of memory banking.</p>
<p>After this experience, I developed a memory banking framework specifically for rapid prototyping with Amazon Q: It's open-source (contributions are welcome!) and <a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking">you can find it on github.</a></p>
<p>To use the framework you need:</p>
<ul>
<li><p>an idea to explore</p>
</li>
<li><p>Amazon Q (both CLI or IDEs plugin)</p>
</li>
</ul>
<p>The framework consists in specifications (provided to agent by .md files) and prompts (provided to agent by you via chat).</p>
<h2 id="heading-specifications">Specifications</h2>
<p>Specs consists in 5 MD files:</p>
<ul>
<li><p><a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking/blob/main/q-vibes-memory-banking.md">q-vibes-memory-banking.md</a> - the AI Contract. This contains the complete framework instructions that tell the AI <strong>how</strong> to work with memory banking when initiating a new session, resuming a session and updating docs at the end of an iteration. This is provided, no need to edit it.</p>
</li>
<li><p><a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking/blob/main/templates/idea.md">idea.md</a>: Captures the core concept and success criteria for your prototype. This is your north star - created once and rarely changes. The AI creates this from your initial description, but may ask clarifying questions to complete all sections using the template structure.</p>
</li>
<li><p><a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking/blob/main/templates/vibe.md">vibe.md</a>: Defines how you want to collaborate with the AI assistant. Specifies your interaction style, tech stack preferences, decision-making approach, git workflow, security practices, documentation requirements, and speed vs quality trade-offs. You have to create and mantain it.</p>
</li>
<li><p><a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking/blob/main/templates/state.md">state.md</a>: The living technical snapshot of your prototype. Updated frequently by the AI as you build. Contains current stack, architecture overview, file structure, what's working/broken, immediate next steps, and current focus.</p>
</li>
<li><p><a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking/blob/main/templates/decisions.md">decisions.md</a>: Log of key choices made during development. Prevents re-discussing the same decisions. The AI creates and maintains this file as architectural and technical decisions are made, following the template structure.</p>
</li>
</ul>
<p>This makes the framework complete and self-contained. The AI gets both:</p>
<ul>
<li><p><strong>How to work</strong> (from the framework instructions in <code>q-vibes-memory-banking.md</code>)</p>
</li>
<li><p><strong>What to work on</strong> (from the 4 context files: <code>idea.md</code>, <code>vibe.md</code>, <code>state.md</code>, <code>decisions.md</code>)</p>
</li>
</ul>
<h2 id="heading-usage">Usage</h2>
<p>Let's say you have a brilliant idea and you want to explore it.</p>
<p>All you need is to:</p>
<ul>
<li><p>create a project folder.</p>
</li>
<li><p>create a <em>.amazonq/vibes</em> sub-folder and copy templates inside it.</p>
</li>
<li><p>create your <em>vibes.md</em>, you can start from the template or the provided example.</p>
</li>
<li><p>prompt your idea and clarify it with the agent.</p>
</li>
</ul>
<p>Your prompt should be something like:</p>
<pre><code class="lang-markdown">Hi! I want to start a new prototype using Q-Vibes Memory Banking. 

Please read the framework instructions in .amazonq/vibes/q-vibes-memory-banking.md first to understand how to work with this system.

My prototype idea: [Describe your idea here - can be brief, just the core concept]
</code></pre>
<p>The agent would ask you for further clarification and confirmation of his assumptions, with the intention of narrowing and clarifying the scope.</p>
<p>Resuming a session is even easier. Just prompt the agent with a simple request to pick up where you left off. Something like this:</p>
<pre><code class="lang-markdown">Hi! I'm resuming work on my prototype using Q-Vibes Memory Banking.

Please read the framework instructions in .amazonq/vibes/q-vibes-memory-banking.md first, then read all the context files in .amazonq/vibes/ folder to understand the current state.

Once you've reviewed everything, please confirm what we're building, where we left off, and what the next steps should be.
</code></pre>
<p>Please read the <a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking/blob/main/README.md">README.md</a> of the project for a quick setup, complete instructions and a running example.</p>
<p>Note that you and the agent are <strong>jointly responsible</strong> for ensuring that the specifications are clear and match. There is no magic here: the better the input, the better the output.</p>
<h2 id="heading-benefits">Benefits:</h2>
<p>The key benefits are:</p>
<ul>
<li><p>it is very fast: i created the <a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking/tree/main/examples/builder-tracker">example provided</a> in less than 1 hour, while also testing session resuming.</p>
</li>
<li><p>you provide guard rails: not arbitrary code, but code that suits your needs and style</p>
</li>
<li><p>AI helps you explore your idea: In my experience, the agent's questions helped narrow down my ideas better</p>
</li>
<li><p>no loss of context: you don't have to provide context to the agent</p>
</li>
</ul>
<h1 id="heading-prototype-memory-product-memory">Prototype Memory ≠ Product Memory</h1>
<p>You might be asking yourself:</p>
<p><strong>“Why do we need a specific framework? Why not just use a full spec-driven development framework?”</strong></p>
<p>Because <strong>prototyping has different goals</strong>.</p>
<p>It’s not just an early phase of development — it’s a different mode entirely.</p>
<p>In the prototyping phase, you’re optimizing for <strong>speed, creativity, and cost-efficiency</strong> — not for durability, scalability, or perfect accuracy. You want to explore, validate, and iterate quickly. That means you can (and should) tolerate some messiness and manual steps, as long as they accelerate learning.</p>
<p>That’s why the memory needs during prototyping are also different.</p>
<p>You don’t need a persistent, multi-session memory graph. You need just enough structure to help your AI collaborator stay aligned through a rapid, idea-driven loop.</p>
<p>This framework isn’t built for production agents or end-user memory systems. It’s not meant to manage complexity across months or teams.</p>
<p>Instead, it’s designed for that <strong>middle space between a blank prompt and full-stack dev</strong> — where ideas are still forming, and flow matters more than polish.</p>
<p>If you’re in that zone, a lightweight memory bank gives you:</p>
<ul>
<li><p>Direction without rigidity</p>
</li>
<li><p>Consistency without ceremony</p>
</li>
<li><p>Momentum without drift</p>
</li>
</ul>
<h1 id="heading-conclusions">Conclusions</h1>
<p>Vibe-coding is here to stay — and it’s magical when it works. But even vibes need a spine.</p>
<p>This lightweight memory banking approach gives you structure <em>just enough</em> to stay aligned, while keeping the creative momentum alive.</p>
<p>If you’re prototyping with Amazon Q or any other Agent / LLM, give it a try.</p>
<p>The framework is tailored on Amazon Q, but not bounded to it.</p>
<p><a class="user-mention" href="https://hashnode.com/@darshitpandya">Darshit Pandya</a> (you can find him also on <a target="_blank" href="https://www.linkedin.com/in/darshitpandya/">LinkedIn</a>) created a version tailored on <a target="_blank" href="https://github.com/Darshitpandya/github-copilot-context-keeper">Github Copilot</a>, and we are going to benchmark the framework against the two agents to measure its performance and collaborate to improve it.</p>
<p><a target="_blank" href="https://github.com/ncremaschini/amazon-q-vibes-memory-banking">Check out the framework, fork it</a>, remix it, build something weird, share it.</p>
<p>Because vibes are better when they remember what they’re building.</p>
<p>Specs, just enough.</p>
]]></content:encoded></item><item><title><![CDATA[Building Think-o-matic: A Vibe-Coding Journey with Amazon Q]]></title><description><![CDATA[We’ve all had those moments where inspiration strikes, but the traditional coding workflow — planning, scaffolding, testing, debugging — feels like too much friction.
What if instead, you could prototype with a different mindset? One that prioritizes...]]></description><link>https://haveyoutriedrestarting.com/building-think-o-matic-a-vibe-coding-journey-with-amazon-q</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/building-think-o-matic-a-vibe-coding-journey-with-amazon-q</guid><category><![CDATA[vibe coding]]></category><category><![CDATA[Amazon Q]]></category><category><![CDATA[prototyping]]></category><category><![CDATA[AI]]></category><category><![CDATA[#ai-tools]]></category><category><![CDATA[generative ai]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Sun, 22 Jun 2025 22:39:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750631336144/dfd3c406-6ff4-4578-9412-c639db2f58f4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We’ve all had those moments where inspiration strikes, but the traditional coding workflow — planning, scaffolding, testing, debugging — feels like too much friction.</p>
<p>What if instead, you could <strong>prototype</strong> with a different mindset? One that prioritizes momentum, creativity, and <em>just enough</em> structure to explore an idea?</p>
<p>That’s where <strong>vibe-coding</strong> comes in.</p>
<p>This unconventional approach flips the traditional dev cycle on its head. You don’t manually write every line or carefully craft a layered architecture — instead, you <strong>describe what you want</strong>, and let AI tools do the heavy lifting.</p>
<p>Vibe-coding isn’t about writing perfect code. It’s about describing intent, trusting the process, and shipping fast.</p>
<h2 id="heading-vibe-coding-what-it-is-and-why-it-matters">Vibe-coding: what it is and why it matters</h2>
<p>This term comes out from <a target="_blank" href="https://x.com/karpathy/status/1886192184808149383">this Andrej Karpathy’s X post</a>, just few weeks ago, and the industry is trying to converge on a definition and to standardize it.</p>
<p>A few weeks later, distinguished authors are writing and publishing books about this techniuqe, just the mention some:</p>
<ul>
<li><p><a target="_blank" href="https://a.co/d/fvC54LH">Vibe-Coding by Gene Kim and Steve Yegge</a></p>
</li>
<li><p><a target="_blank" href="https://a.co/d/iqwhB5u">Beyond Vibe-Coding by Addy Osmani</a></p>
</li>
</ul>
<p>I tried to summarize the key points of vibe-coding from Karpathy’s post.</p>
<p><strong>Vibe-coding</strong> is:</p>
<ul>
<li><p>Letting AI tools write, fix, and modify the code.</p>
</li>
<li><p>Embracing <em>feel</em> and <em>flow</em> over full code comprehension.</p>
</li>
<li><p>Typing as little as possible — mostly just <em>describe, accept, and run</em>.</p>
</li>
<li><p>Skipping diffs, skimming errors, and trusting AI suggestions.</p>
</li>
<li><p>Supervising the AI rather than driving every keystroke.</p>
</li>
</ul>
<p>This is not traditional coding. It’s prototyping for the AI-native era — perfect for weekend projects, experiments, or validating ideas before investing in full-scale development.</p>
<p>If you ever had some idea about building something on your own, it is very clear why it matters: it makes prototyping fast, cheap, and easy.</p>
<h3 id="heading-a-quick-reminder-whats-prototyping"><strong>A Quick Reminder: What’s Prototyping?</strong></h3>
<p>In software engineering, prototyping is about:</p>
<blockquote>
<p>Creating a preliminary version of a system to explore ideas, validate functionality, and gather user feedback before full-scale development.</p>
</blockquote>
<p>It’s low-commitment, fast-paced, and feedback-driven — which makes it the perfect playground for AI-powered workflows.</p>
<h2 id="heading-fast-prototyping-steps">Fast-prototyping steps:</h2>
<p>I have tried to define some steps to structure my vibe coding sessions:</p>
<ol>
<li><p>Have an idea: this seems obvious, but it's not. You can create something if you have an idea that is clear enough to be built and executed, but also leaves some room for exploration.</p>
</li>
<li><p>Set up tools: You need a toolbox that is easy to set up, quick, cheap and that you trust. In these times, I don't think it's worth spending too much time finding the perfect tools or optimizing them: Your perfect tool could be obsolete tomorrow.</p>
</li>
<li><p>Describe your idea: This means you tell your tools what you want to build together.</p>
</li>
<li><p>Follow the vibes: This step is actually an inner loop consisting of three sub-steps:</p>
<ol>
<li><p>describe, Accept, Execute: You ask to build something, accept and execute what you've built.</p>
</li>
<li><p>check the results: Confirm what's going on and hold the wheel.</p>
</li>
<li><p>determine cooperation style. From time to time, you'll need to refine the way you want to collaborate with your toolbox.</p>
</li>
</ol>
</li>
<li><p>Enough is enough: you are building a prototype, not a product: this means you are not looking for perfection nor a complete system.</p>
</li>
</ol>
<p>i have schematized these steps like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750625014783/d3420fb7-030a-499b-b25e-3bb21eaa1f52.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-a-real-example-building-think-o-matic"><strong>A Real Example: Building Think-o-matic</strong></h2>
<p>To make this more tangible, here’s how I prototyped a tool called <strong>Think-o-matic</strong> — an AI-powered copilot to help structure workshops, generate agendas, create Miro boards, and summarize outcomes into actionable Trello tasks.</p>
<h3 id="heading-having-an-idea">Having an idea</h3>
<p>My ideas comes out most of times from real-life problems that i can’t fix. I’m often wondering if i could build something to make my life easier, and this helps me in three ways:</p>
<p>First, i love build, i find it fun.</p>
<p>Second, i go deep in my problem understanding: if you want a solution, you have to target your problem.</p>
<p>Third, i got the problem solved! One less…</p>
<p>This is exactly what happend with Thinlk-o-Matic.</p>
<p>In my current job role i need to put stakeholders around a table, often virtual, and making them working togheter to target problems, find solutions and explore ideas.</p>
<p>In other words, i need to extract information from them, and one effective way to do this is to running workshops.</p>
<p>Running a workshop involves the following steps, and that is exactly what i need help on:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750626286578/1554bb00-c5b5-4e72-a940-d436af48db3a.png" alt class="image--center mx-auto" /></p>
<p>Between steps 3 and 4 there is the workshop run tself.</p>
<p>I also draft a high-level architecture of the prototype:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750628788112/939e9b4b-d636-44cb-8e74-7811f3ebc638.png" alt class="image--center mx-auto" /></p>
<p>A front-end webapp backend by an ExpressJS server running on nodejs environment, that provides integration to miro and trello, and to Amazon Bedrock to provide “intelligence” to the system: Amazon Nova would generate the workshop agenda and summarize the Miro Board.</p>
<p>No deployment needed, frontend and backend app would run locally.</p>
<h3 id="heading-set-up-tools">Set up tools</h3>
<p>My toolbox is very easy: terminal, Amazon Q CLI (agentic) backed by Claude Sonnet 4.0. That is.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750626505492/a5b765a3-b96e-439c-83ae-425ad6778059.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-describe-your-idea">Describe your idea</h3>
<p>I used a greatly simplified version of a technique called memory banking that I learned from <a target="_blank" href="https://cline.bot/blog/memory-bank-how-to-make-cline-an-ai-agent-that-never-forgets">this blog post from Cline.</a></p>
<p>In a few words, it’s a way to remember what's going on in the project and between you and the agent, because otherwise it would easily forget what’s going on and have to recreate the context for your agent, leaving the vibes.</p>
<p>Creating a memory bank helps your AI:</p>
<ul>
<li><p>Stay aligned with goals across sessions.</p>
</li>
<li><p>Remember decisions already made.</p>
</li>
<li><p>Reduce repetition and confusion.</p>
</li>
</ul>
<p>In vibe-coding, memory banking becomes your anchor — keeping prototypes from drifting too far off course.</p>
<p>I have provided two files:</p>
<p><a target="_blank" href="https://github.com/ncremaschini/think-o-matic-q/blob/main/.amazonq/specs/prototypes_general_guidelines.md">Prototype Guidelines</a>: Instructions on what is a prototype, what is not, and how I want to build prototypes. This prompt does not refer to a specific prototype and is reusable</p>
<p><a target="_blank" href="https://github.com/ncremaschini/think-o-matic-q/blob/main/.amazonq/specs/thinkomatic_specific_guidelines.md">Think-o-matic specific guidelines</a>: Instructions about this specific idea.</p>
<p>The prompt style is a mix of the Risen framework (role, input, steps, expectation, narrowing) and the Rodes framework (role, objectives, details, examples, sense check).</p>
<p>Again, I don't want to spend too much time on prompting, and of course I used LLMs to write my prompts.</p>
<p>My session started with these two files in a folder and a little prompt that went something like this:</p>
<blockquote>
<p><em>before doing anything, read these two files and tell me what you think about. Please use the same folder to create your checkpoint files.</em></p>
</blockquote>
<p>In this way i also instructed the agent to <em>update</em> the memory bank while going on with the work.</p>
<h3 id="heading-follow-the-vibes-check-results-make-friends">Follow the vibes + Check Results + Make friends</h3>
<p>After this little prompt, the agent red the specs i provided and proposed me an action plan.</p>
<p>We agreed on the steps and to check-in with me after every step.</p>
<p>The first iteration result was this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750628722431/6088c4f8-5a94-4850-b4ac-c3ae7696cd0e.png" alt class="image--center mx-auto" /></p>
<p>Basically the first iteration covered the working app with all integration mocked, in about 15 minutes.</p>
<h3 id="heading-what-went-well-vs-what-went-weird"><strong>What Went Well vs. What Went Weird</strong></h3>
<p><strong>✅ Good:</strong></p>
<ul>
<li><p>AI nailed the folder structure and scaffolding.</p>
</li>
<li><p>The prototype ran with minimal setup.</p>
</li>
<li><p>I stayed in the creative zone.</p>
</li>
</ul>
<p><strong>❌ Bad:</strong></p>
<ul>
<li><p>Wrong AWS region.</p>
</li>
<li><p>No documentation, even if i asked for.</p>
</li>
<li><p>Laughably bad UX.</p>
</li>
<li><p>A few silly bugs.</p>
</li>
</ul>
<p>But that’s okay — vibe-coding isn’t about perfection. It’s about fast feedback and learning by doing.</p>
<h3 id="heading-making-friends-tune-your-cooperation-style"><strong>Making Friends: Tune Your Cooperation Style</strong></h3>
<p>Vibe-coding isn’t autopilot. You’re not giving up control — you’re adjusting how you cooperate.</p>
<p>I think of this part as <strong>“making friends”</strong> with the AI. Like any relationship, it needs clear communication and trust — but also healthy boundaries.</p>
<p>Here’s a real example: I forgot to specify the AWS Region in a prompt. The AI defaulted to us-east-1 (no idea why). I needed a <strong>Bedrock model</strong> that was only enabled in eu-west-1. Instead of asking me, the agent silently changed the model to something available in us-east-1.</p>
<p>That’s when I stepped in. I told the agent:</p>
<blockquote>
<p>“For small things, go ahead. But for big architectural decisions — <strong>ask me.</strong>”</p>
</blockquote>
<p>That balance is key. You want the AI to be proactive, but aligned. Let it move fast — just not in the wrong direction.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750629409349/d976eaf7-5b77-4e36-935c-9cdbec81d7c4.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-the-final-result-think-o-matic">The final result: think-o-matic</h2>
<p>I worked with the agent for a few hours: We added one feature after another: Agenda creation, Miro Board creation, Miro Board summary, Trello integration.</p>
<p>For each feature implemented, Q updated the memory bank.</p>
<p>The result? You can try it out for yourself by running it locally.</p>
<p><a target="_blank" href="https://github.com/ncremaschini/think-o-matic-q">Here is the github repo with the code and instructions to run it.</a></p>
<p>If you look at the repo, you might wonder why there is only one branch and one commit: I created it as a private repo and before I made it public, I searched it for secrets and found that Q wrote my secrets to the memory bank. Trust, but verify.</p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>This is just from my own notes i took after those few hours:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>✅ Do</strong></td><td><strong>❌ Don’t</strong></td></tr>
</thead>
<tbody>
<tr>
<td>State clear goals</td><td>Over-engineer</td></tr>
<tr>
<td>Define “won’t do”</td><td>Forget the code exists</td></tr>
<tr>
<td>Use memory banks</td><td>Ask for endless validation</td></tr>
<tr>
<td>Work in small chunks</td><td>Force AI to stick to one approach</td></tr>
<tr>
<td>Create checkpoints</td><td>Ignore drift — it happens!</td></tr>
<tr>
<td>Tune your cooperation style</td><td>Expect the AI to guess your intent</td></tr>
</tbody>
</table>
</div><p>Pro tip: Let the AI drift <em>a bit</em>. Sometimes, the best ideas emerge sideways.</p>
<h2 id="heading-waiting-for-the-doom-moment">Waiting for the Doom Moment</h2>
<p>A few weeks ago, I had the pleasure of leading a <a target="_blank" href="https://www.meetup.com/the-cloud-house/events/306876067/">roundtable discussion with Jeff Barr</a>, Chief Evangelist for AWS and one of the most influential engineers in software engineering and cloud computing, and I asked him about the future of Gen AI. I asked, what’s next?</p>
<p>He responded with a story from 1992 - 1993.</p>
<p>in 1992, we all loved <a target="_blank" href="https://en.wikipedia.org/wiki/Wolfenstein_3D">Wolfstein 3D.</a></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750630293075/99a42c1b-cd2c-4a17-85b6-0f11f0f4eaa8.png" alt class="image--center mx-auto" /></p>
<p>Despite the name, it was not real 3D, but it was the first in-person shooter game.</p>
<p>A year later, John Carmack developed the <a target="_blank" href="https://en.wikipedia.org/wiki/Doom_engine">Doom Engine</a> using basically the same technology, and we were all shocked by Doom</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750630490814/964f19e6-8a99-4143-87d9-80face49c041.png" alt class="image--center mx-auto" /></p>
<p>That’s where we are with AI and prototyping right now.</p>
<p>We’re still building Wolfensteins.</p>
<p>But Doom is coming.</p>
]]></content:encoded></item><item><title><![CDATA[Building Atomic Counters with Amazon DocumentDB]]></title><description><![CDATA[Introduction
This is the final installment in my atomic counters series where I explore different distributed databases and how they implement atomic counters.
This time, we’re looking at Amazon DocumentDB a managed NoSQL document database, MongoDB-c...]]></description><link>https://haveyoutriedrestarting.com/building-atomic-counters-with-amazon-documentdb</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/building-atomic-counters-with-amazon-documentdb</guid><category><![CDATA[AWS]]></category><category><![CDATA[serverless]]></category><category><![CDATA[MongoDB]]></category><category><![CDATA[aws-documentdb]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Databases]]></category><category><![CDATA[distributed system]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Mon, 17 Mar 2025 10:34:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/nsQeVhtnyFc/upload/1139784b30fc09822eddc2a56176210d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>This is the final installment in my <a target="_blank" href="https://haveyoutriedrestarting.com/series/atomic-counter">atomic counters series</a> where I explore different distributed databases and how they implement atomic counters.</p>
<p>This time, we’re looking at <a target="_blank" href="https://aws.amazon.com/documentdb/">Amazon DocumentDB</a> a managed NoSQL document database, <a target="_blank" href="https://www.mongodb.com/"><strong>MongoDB-compatible</strong></a> and optimized for AWS.</p>
<p>Atomic counters are a common requirement in distributed applications, whether for tracking views, managing inventory, or implementing rate limiting.</p>
<p>In this article, we’ll discuss how <strong>DocumentDB handles atomic updates</strong> and explore a <strong>working example</strong> from my <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">GitHub repository.</a></p>
<h2 id="heading-serializability-and-linearizability-in-documentdb"><strong>Serializability and Linearizability in DocumentDB</strong></h2>
<p>Before we can dive deep into code, we need to recall few concepts (please refer to <a target="_blank" href="https://haveyoutriedrestarting.com/atomic-counter-framing-the-problem-space"><strong>the first article of this series for a detailed explanation</strong>)</a></p>
<ul>
<li><p><strong>Serializability</strong>: Operations appear in a consistent sequential order, ensuring correctness.</p>
</li>
<li><p><strong>Linearizability</strong>: Writes are immediately visible for subsequent reads, ensuring real-time consistency.</p>
</li>
</ul>
<p>DocumentDB achieves linearizable writes through its <strong>single-primary, multi-replica architecture</strong>:</p>
<p>• <strong>Write operations are directed to the primary instance</strong>, and changes are asynchronously replicated to secondaries.</p>
<p>• <strong>Read operations from the primary always return the latest committed value</strong>, ensuring linearizability.</p>
<p>• <strong>Replica reads may return stale data</strong> due to replication lag, meaning they are eventually consistent.</p>
<p>This guarantees that <strong>atomic updates within a single document, like counters using the $inc operator, remain correct and isolated</strong>.</p>
<p>While not required in this specific scenario, it is worth to mention DocumentDB supports:</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/documentdb/latest/developerguide/how-it-works.html#durability-consistency-isolation">read isolation level configuration</a></p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/documentdb/latest/developerguide/transactions.html">transactions and their isolation level, read and write concerns configuration.</a></p>
</li>
</ul>
<h2 id="heading-replication-and-leader-election-in-documentdb">Replication and Leader Election in DocumentDB</h2>
<p>DocumentDB automatically replicates data across multiple availability zones to ensure durability and availability.</p>
<p>Key mechanisms include:</p>
<p>• <strong>Single-primary replication</strong>: A single primary instance handles writes, while replicas asynchronously replicate data and serve read requests.</p>
<p>• <strong>Leader election</strong>: If the primary instance fails, DocumentDB automatically promotes a replica to primary, minimizing downtime and maintaining availability.</p>
<p>This replication strategy allows DocumentDB to scale reads across replicas while ensuring that writes remain <strong>strongly consistent</strong> on the primary.</p>
<p>However, applications must account for <strong>eventual consistency</strong> when reading from replicas due to asynchronous replication.</p>
<h1 id="heading-the-atomic-counter-pattern">The Atomic Counter Pattern</h1>
<p>The atomic counter pattern enables precise increment operations, even in distributed environments.</p>
<p>With DocumentDB, you use the <strong>$inc operator</strong>, which atomically increments a numeric field within a document.</p>
<p>This ensures that <strong>concurrent increments are safely serialized</strong> without race conditions.</p>
<p><strong>DocumentDB supports conditional increments natively</strong>: you can use <strong>$inc with $cond in an update operation</strong> to increment the counter only when certain conditions are met—<strong>all in a single atomic operation</strong>.</p>
<p>This makes DocumentDB a good choice when you need <strong>both unconditional and conditional increments</strong>, ensuring correctness without requiring complex client-side logic.</p>
<h1 id="heading-hands-on-walkthrough-of-the-deployable-example"><strong>Hands-on! Walkthrough of the Deployable Example</strong></h1>
<p>Let’s examine how the deployable example in <a target="_blank" href="https://github.com/ncremaschini/atomic-counter"><strong>this GitHub repository</strong></a></p>
<p>This example demonstrates how to implement an atomic counter using <strong>AWS Lambda</strong>, <strong>API Gateway</strong>, and <strong>DocumentDB</strong>.</p>
<p>1. <strong>API Gateway</strong>: Provides HTTP endpoints for interacting with the counter.</p>
<p>2. <strong>Lambda Functions</strong>: Implements the business logic for incrementing the counter.</p>
<p>3. <strong>DocumentDB</strong>: Stores the counters.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742054210547/4ff5e661-e4ff-4b02-8b75-c85ebe5df724.png" alt class="image--center mx-auto" /></p>
<p>In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use or not conditional writes.</p>
<p>Let’s focus on lambda business logic, <strong>from the</strong> <a target="_blank" href="https://github.com/ncremaschini/atomic-counter/blob/main/lib/lambda/documentDB/counterLambda/index.ts">docDbCounterLambda</a> code:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> documentDBClient = <span class="hljs-keyword">await</span> buildDocumentDbClient();

<span class="hljs-keyword">await</span> documentDBClient.connect();

<span class="hljs-keyword">const</span> countersCollection = documentDBClient.db(<span class="hljs-string">"atomic_counter"</span>).collection(<span class="hljs-string">'counters'</span>);

<span class="hljs-keyword">const</span> updateFilter = getUpdateFilter(useConditionalWrites, id, maxCounterValue);

<span class="hljs-keyword">const</span> updateResult = <span class="hljs-keyword">await</span> countersCollection.updateOne(
    updateFilter,
    {
        $inc: { atomic_counter: <span class="hljs-number">1</span> }
    },
    {
        upsert: <span class="hljs-literal">true</span>, 
    }
);
</code></pre>
<p>Here I use the <strong>$inc</strong> operator with the <em>upsert</em> flag set to true: this makes the method work both the first time, when the counter does not exist and is therefore initialized to zero, and for further increment operations.</p>
<p>What changes between conditional and unconditional write operations is the <em>updateFilter</em> returned by the <em>getUpdateFilter</em> method.</p>
<p>Let’s have a look at it:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> getUpdateFilter = <span class="hljs-function">(<span class="hljs-params">useConditionalWrites: <span class="hljs-built_in">boolean</span>, id: <span class="hljs-built_in">number</span>, maxCounterValue: <span class="hljs-built_in">string</span></span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> unconditionalWriteParams = {
    counter_id: id
  }

  <span class="hljs-keyword">const</span> conditionalWriteParams = {
    counter_id: id,
    $and: [
      { atomic_counter: { $lt: <span class="hljs-built_in">Number</span>(maxCounterValue) } }
    ],
  }

  <span class="hljs-keyword">return</span> useConditionalWrites ? conditionalWriteParams : unconditionalWriteParams;
}
</code></pre>
<p>For unconditional writes, the only filter is the <em>counter_id</em> attribute.</p>
<p>For conditional writes, the construct <strong>$lt</strong> (lower than) is added as an additional condition to check whether the value is below the maximum value.</p>
<p>Since the update is performed for a single document and the increment operation is performed on the server side, atomicity is guaranteed and the counter value cannot exceed the maximum value</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742121822865/5599647f-2f8e-4ef2-a920-d55fa2de15aa.png" alt="if two concurrrent increments are requested and the second one would exceed the maximum value, the first is accepted while the second is rejected" class="image--center mx-auto" /></p>
<h1 id="heading-trade-offs-and-conclusion"><strong>Trade-Offs and Conclusion</strong></h1>
<p>Like other databases in this series, DocumentDB comes with <strong>trade-offs</strong> when used for atomic counters:</p>
<h2 id="heading-strenghts">Strenghts:</h2>
<ul>
<li><p><strong>MongoDB Compatibility</strong>: Developers familiar with MongoDB can reuse existing knowledge.</p>
</li>
<li><p><strong>Managed Scaling</strong>: AWS handles <strong>replication, backups, and failover</strong>.</p>
</li>
<li><p><strong>Atomic Updates on a Single Document</strong>: <strong>$inc</strong> ensures updates are atomic.</p>
</li>
</ul>
<h2 id="heading-limitations">Limitations:</h2>
<ul>
<li><p><strong>Eventual Consistency for Replicas</strong>: Secondary reads may return stale data.</p>
</li>
<li><p><strong>Higher Latency for Stronger Consistency</strong>: To ensure <strong>fresh data</strong>, queries must be sent to the primary instance.</p>
</li>
</ul>
<h2 id="heading-key-takeaways"><strong>Key Takeaways:</strong></h2>
<ul>
<li><p><strong>Atomic counters in DocumentDB</strong> can be implemented using the <strong>$inc</strong> operator, ensuring <strong>atomic updates at the document level</strong>.</p>
</li>
<li><p><strong>Conditional increments are fully supported</strong> using <strong>$inc</strong> combined with <strong>$cond</strong>, allowing for server-side enforcement of constraints.</p>
</li>
<li><p><strong>DocumentDB follows a single-primary, multi-replica model</strong>, meaning writes are <strong>strongly consistent</strong>, but replica reads may be <strong>eventually consistent</strong>.</p>
</li>
<li><p><strong>Automatic leader election</strong> ensures high availability by promoting a replica to primary in case of failure.</p>
</li>
</ul>
<p>You can find the <strong>full runnable example</strong> in my GitHub repository: <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">atomic-counter</a>.</p>
<p>This marks the end of the <strong>atomic counter series</strong>! 🚀</p>
]]></content:encoded></item><item><title><![CDATA[Building Atomic Counters with TiDB]]></title><description><![CDATA[Distributed SQL databases have become a cornerstone for applications that require global scalability and strong consistency, and this problem has existed since the very first deployment of a database on two distinct servers: how to achieve strong con...]]></description><link>https://haveyoutriedrestarting.com/building-atomic-counters-with-tidb</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/building-atomic-counters-with-tidb</guid><category><![CDATA[AWS]]></category><category><![CDATA[serverless]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[SQL]]></category><category><![CDATA[tidb]]></category><category><![CDATA[newSQL]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Sun, 16 Feb 2025 18:00:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/f7YQo-eYHdM/upload/4652898d884ef17c74e41f6fc4cfbdb7.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Distributed SQL databases have become a cornerstone for applications that require <strong>global scalability and strong consistency</strong>, and this problem has existed since the very first deployment of a database on two distinct servers: how to achieve strong consistency and scalability without compromising availability?</p>
<p>Is it possible?</p>
<p>The <a target="_blank" href="https://en.wikipedia.org/wiki/CAP_theorem">CAP theorem</a> states that this isn't possible: if you consider consistency, availability and partitioning as fundamental properties of data storage, you can only choose two out of three properties.</p>
<p>In this fourth part of the series on atomic counters, we'll explore how the pattern can be implemented using <a target="_blank" href="https://www.pingcap.com/tidb-cloud-serverless/">TiDB</a>, <a target="_blank" href="https://en.wikipedia.org/wiki/NewSQL">NewSql's</a> database class, as an example, focusing on global partitioning, strong consistency and high availability.</p>
<p>This article will provide a closer look at TiDB’s unique architecture, discuss trade-offs, and refer to a practical implementation found in <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">this GitHub repository</a>.</p>
<h1 id="heading-serializability-linearizability-and-tidb"><strong>Serializability, Linearizability, and TiDB</strong></h1>
<p>TiDB guarantees <strong>strong consistency</strong> across its distributed nodes by adopting a <strong>two-phase commit (2PC)</strong> protocol. This ensures that all transactions, including atomic increments, are serialized and linearizable.</p>
<p>To provide high availability, TiDB replicates data using <a target="_blank" href="https://en.wikipedia.org/wiki/Raft_\(algorithm\)"><strong>Raft</strong></a>, a consensus algorithm that ensures data consistency across regions. This makes TiDB well-suited for use cases requiring globally consistent counters.</p>
<h1 id="heading-replication-and-leader-election-in-tidb"><strong>Replication and Leader Election in TiDB</strong></h1>
<p>TiDB’s replication model is built on <strong>Raft</strong>, where each region has a leader and multiple followers.</p>
<p>• The <strong>Raft leader</strong> handles writes and ensures consistency through consensus.</p>
<p>• Followers replicate data for high availability and enable failover in case of leader failure.</p>
<p>This replication mechanism ensures that even in multi-region deployments, TiDB maintains consistency and availability.</p>
<h1 id="heading-the-atomic-counter-pattern"><strong>The Atomic Counter Pattern</strong></h1>
<p>The <strong>atomic counter pattern</strong> ensures precise, consistent counter increments even in distributed environments.</p>
<p>With TiDB, you can achieve this using <strong>SQL transactions</strong> and <strong>atomic operations</strong> like UPDATE ... SET.</p>
<h1 id="heading-hands-on-walkthrough-of-the-deployable-example"><strong>Hands-On! Walkthrough of the Deployable Example</strong></h1>
<p>Let’s examine how the deployable example in <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">this GitHub repository</a> implements an atomic counter using <strong>AWS Lambda</strong>, <strong>API Gateway</strong>, and <strong>TiDB</strong>.</p>
<ul>
<li><p><strong>Api Gateway:</strong> Provides HTTP endpoints for interacting with the counter.</p>
</li>
<li><p><strong>Lamba function:</strong> Implements the business logic for incrementing the counter.</p>
</li>
<li><p><strong>TiDB:</strong> Stores the counters.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739721641453/db024e84-f5c0-4d2e-8d1b-d09cb8ae9474.png" alt class="image--center mx-auto" /></p>
<p>In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use or not conditional writes.</p>
<p>Let’s focus on lambda business logic, <strong>from the</strong> <a target="_blank" href="https://github.com/ncremaschini/atomic-counter/blob/main/lib/lambda/tiDB/counterLambda/index.ts">tiDBAtomicCounter Lambda code:</a></p>
<pre><code class="lang-typescript">    connection = <span class="hljs-keyword">await</span> createDbConnection(DB);

    <span class="hljs-keyword">const</span> updateFilter = getUpdateFilter(useConditionalWrites);

    <span class="hljs-keyword">const</span> params = {
      id: id,
      max_value: maxCounterValue
    }

    <span class="hljs-keyword">const</span> [rows] = <span class="hljs-keyword">await</span> connection.query&lt;RowDataPacket[]&gt;(updateFilter, params);
</code></pre>
<p>The <em>getUpdateFilter(useConditionalWrites)</em> method provides a specific filter based on <em>useConditionalWrites</em> boolean variable.</p>
<p>The method is very simple, and i kept it a little verbose than required just for better comprehension.</p>
<p>It basically returns one of two static SQL statements, very similar to each other, with a little but really important difference:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> getUpdateFilter = (useConditionalWrites: <span class="hljs-built_in">boolean</span>): <span class="hljs-function"><span class="hljs-params">string</span> =&gt;</span> {

  <span class="hljs-keyword">const</span> unconditionalWriteParams = <span class="hljs-string">'SELECT counter_value FROM counters WHERE counter_id = :id FOR UPDATE;  \
                                    INSERT INTO counters (counter_id, counter_value) VALUES (:id, 1) \
                                    ON DUPLICATE KEY UPDATE counter_value = counter_value + 1; \
                                    SELECT counter_value FROM counters WHERE counter_id = :id; \
                                    COMMIT;'</span>;

  <span class="hljs-keyword">const</span> conditionalWriteParams = <span class="hljs-string">'SELECT counter_value FROM counters WHERE counter_id = :id FOR UPDATE;  \
                                  INSERT INTO counters (counter_id, counter_value) VALUES (:id, 1) \
                                  ON DUPLICATE KEY UPDATE counter_value = IF(counter_value &lt; :max_value, counter_value + 1, counter_value);\
                                  SELECT counter_value FROM counters WHERE counter_id = :id; \
                                  COMMIT;'</span>;

  <span class="hljs-keyword">return</span> useConditionalWrites ? conditionalWriteParams : unconditionalWriteParams;
}
</code></pre>
<p>Let’s break down each SQL statements:</p>
<h2 id="heading-first-statement">First statement</h2>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> counter_value <span class="hljs-keyword">FROM</span> counters <span class="hljs-keyword">WHERE</span> counter_id = :<span class="hljs-keyword">id</span> <span class="hljs-keyword">FOR</span> <span class="hljs-keyword">UPDATE</span>;
</code></pre>
<p>This statement tells to the db engine that you are selecting the specific table row for update.</p>
<p>Based on the db engine lock mechanism, it locks the row for the duration of the transascion:</p>
<ul>
<li><p>Pessimistic locking: immediately when executing the statement, preventing other concurrent transactions from modifying it.</p>
</li>
<li><p>Optmistic locking: lock is not acquired immediately, but the engine would check for conflicts at commit time and retries if conflicts occur.</p>
</li>
</ul>
<p>TiDB default has changed over time, and it is <a target="_blank" href="https://docs.pingcap.com/tidb/stable/pessimistic-transaction">configurable</a>.</p>
<p>I suggest to consider carefully tradeoffs between the two modes: the right one really depends on your specific use case.</p>
<p><a target="_blank" href="https://docs.pingcap.com/tidb/stable/transaction-overview">Knowledge is free at the library. Just bring your own container.</a></p>
<h2 id="heading-second-statement">Second statement</h2>
<pre><code class="lang-sql"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> counters (counter_id, counter_value) <span class="hljs-keyword">VALUES</span> (:<span class="hljs-keyword">id</span>, <span class="hljs-number">1</span>)
</code></pre>
<p>Nothing special, just tells the db to insert the new row.</p>
<p>But, wait: we were supposed to talk about incrementing counters, not to inserting new rows!</p>
<p>The third statement is where the magic happens.</p>
<h2 id="heading-third-statement-unconditional-writes">Third statement (unconditional writes):</h2>
<pre><code class="lang-sql">ON DUPLICATE KEY <span class="hljs-keyword">UPDATE</span> counter_value = counter_value + <span class="hljs-number">1</span>;
</code></pre>
<p>This statement tells to db engine what to do if the previous statement fails for duplicate key, because we are trying to insert two rows with the same <em>counter_id</em> wich is the table’s primary key.</p>
<p>We are basically asking:</p>
<blockquote>
<p>please increment by one the counter_value field of the row</p>
</blockquote>
<p>With the second and the third statement togheter, we are telling to the db engine:</p>
<blockquote>
<p>please insert this new counter, but if the counter is already present, don’t panic and just increment it</p>
</blockquote>
<h2 id="heading-third-statement-with-conditional-writes">Third statement, with conditional writes:</h2>
<pre><code class="lang-sql">ON DUPLICATE KEY <span class="hljs-keyword">UPDATE</span> counter_value = <span class="hljs-keyword">IF</span>(counter_value &lt; :max_value, counter_value + <span class="hljs-number">1</span>, counter_value);
</code></pre>
<p>just as the unconditional write version, we are telling the db engine to increment the existing row but only if the current counter value is below the <em>max_value</em> parameter.</p>
<blockquote>
<p>please insert this new counter, but if the counter is already present, don’t panic and just increment it if it is below the max value.</p>
</blockquote>
<h2 id="heading-fourth-statement">Fourth statement</h2>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> counter_value <span class="hljs-keyword">FROM</span> counters <span class="hljs-keyword">WHERE</span> counter_id = :<span class="hljs-keyword">id</span>
</code></pre>
<p>This is just for retrevieng of the value after the insert / update.</p>
<h2 id="heading-final-statement">Final statement</h2>
<pre><code class="lang-sql"><span class="hljs-keyword">COMMIT</span>;
</code></pre>
<p>This seems like the simplest statement, but this is where all the magic happens: based on your DB Engine configuration, this is where our five-statement transaction is executed atomically on the server, conflicts are solved and then data replicated if it was successful.</p>
<p>Since transactions are executed on the server and are all-or-nothing statements, they fit the atomic counter pattern perfectly.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739727388573/be4abe2a-3608-4366-bcf5-77015e695556.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-trade-offs-and-conclusion"><strong>Trade-Offs and Conclusion</strong></h1>
<h2 id="heading-strengths"><strong>Strengths</strong>:</h2>
<p>• <strong>Strong Consistency</strong>: TiDB’s Raft-based replication and 2PC protocol ensure consistent increments even in globally distributed environments.</p>
<p>• <strong>SQL Familiarity</strong>: Developers can use familiar SQL syntax, reducing the learning curve.</p>
<h2 id="heading-limitations"><strong>Limitations</strong>:</h2>
<p>• <strong>Latency</strong>: Cross-region communication for strong consistency may increase latency, especially with pessimistic locking configuration.</p>
<p>• <strong>Operational Complexity</strong>: While TiDB Cloud simplifies management, understanding distributed SQL concepts is necessary for effective use.</p>
<h2 id="heading-key-takeaway"><strong>Key Takeaway</strong>:</h2>
<p>TiDB is an excellent choice for globally distributed applications requiring strong consistency. Its support for SQL transactions and automatic scaling makes it a powerful tool for implementing atomic counters in multi-region setups.</p>
]]></content:encoded></item><item><title><![CDATA[Building Atomic Counters with Momento]]></title><description><![CDATA[In the world of distributed systems, serverless caching is gaining traction for its simplicity and scalability. Momento, a fully managed serverless cache, builds on the core concepts of caching while eliminating infrastructure management.
In this thi...]]></description><link>https://haveyoutriedrestarting.com/building-atomic-counters-with-momento</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/building-atomic-counters-with-momento</guid><category><![CDATA[AWS]]></category><category><![CDATA[Databases]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[caching]]></category><category><![CDATA[consistency]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Thu, 02 Jan 2025 16:20:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/9Njoam3Vesc/upload/3f2ce5b080d59a789893686f19cd547f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of distributed systems, <strong>serverless caching</strong> is gaining traction for its simplicity and scalability. <a target="_blank" href="https://www.gomomento.com/"><strong>Momento</strong></a>, a fully managed serverless cache, builds on the core concepts of caching while eliminating infrastructure management.</p>
<p>In this third installment of the <a target="_blank" href="https://haveyoutriedrestarting.com/series/atomic-counter">atomic counter series</a>, we’ll explore how to implement the pattern using <a target="_blank" href="https://www.gomomento.com/"><strong>Momento</strong></a>. By comparing it to <a target="_blank" href="https://redis.io/"><strong>Redis</strong></a>, we’ll highlight how Momento simplifies caching for developers, discuss its trade-offs, and guide you through a practical implementation using the code in <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">this GitHub repository</a>.</p>
<h1 id="heading-serializability-linearizability-and-momento"><strong>Serializability, Linearizability, and Momento</strong></h1>
<p>Unlike traditional caching systems, Momento operates as a <strong>serverless service</strong>, meaning you don’t manage nodes, replicas, or clusters.</p>
<p>However, like Redis, it provides atomic operations such as increment.</p>
<p>Momento’s atomicity ensures that counter updates are serialized within its storage layer. However, consistency across distributed systems can vary based on use cases, which aligns with the <strong>eventual consistency</strong> model in serverless architectures.</p>
<h1 id="heading-replication-and-leader-election-in-momento"><strong>Replication and Leader Election in Momento</strong></h1>
<p>As a managed service, <strong>Momento abstracts replication and failover</strong>.</p>
<p>You don’t have visibility into specific replicas or leaders, but the platform ensures high availability by handling replication and redundancy under the hood.</p>
<p>This is a notable difference from Redis, where you control and configure replication explicitly.</p>
<p>Momento offers simplicity at the cost of operational transparency and fine-grained control.</p>
<h1 id="heading-the-atomic-counter-pattern">The Atomic Counter pattern</h1>
<p>The <strong>atomic counter pattern</strong> enables precise increment operations, even in distributed environments.</p>
<p>With Momento, you use its increment operation, which automatically initializes the counter if it doesn’t exist, similar to Redis.</p>
<p>This approach works very well if you need to increment your counter unconditionally, regardless its current value.</p>
<p>If you need to implement conditional increment, Momento doesn’t provide methods to do so on server side: you have to handle it on client side, and this could lead to race conditions.</p>
<h1 id="heading-hands-on-walkthrough-of-the-deployable-example"><strong>Hands-on! Walkthrough of the Deployable Example</strong></h1>
<p>Let’s examine how the deployable example in <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">this GitHub repository</a></p>
<p>This example demonstrates how to implement an atomic counter using <strong>AWS Lambda</strong>, <strong>API Gateway</strong>, and <strong>Momento</strong>.</p>
<p>1. <strong>API Gateway</strong>: Provides HTTP endpoints for interacting with the counter.</p>
<p>2. <strong>Lambda Functions</strong>: Implements the business logic for incrementing the counter.</p>
<p>3. <strong>Momento</strong>: Stores the counters.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736244695494/c95ab48e-12fd-4bf0-84e1-87b2e284fbae.png" alt class="image--center mx-auto" /></p>
<p>In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use or not conditional writes.</p>
<p>Let’s focus on lambda business logic, <strong>from the</strong> <a target="_blank" href="https://github.com/ncremaschini/atomic-counter/blob/main/lib/lambda/momento/index.ts"><strong>momentoAtomicCounter</strong></a> <strong>Lambda</strong> code:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> momentoCacheClient = <span class="hljs-keyword">await</span> buildMomentoClient();

<span class="hljs-keyword">let</span> counter = <span class="hljs-number">0</span>;

<span class="hljs-keyword">if</span> (useConditionalWrites) {
    counter = <span class="hljs-keyword">await</span> handleConditionalWrites(momentoCacheClient,cacheName, id, maxCounterValue);
 } <span class="hljs-keyword">else</span> {
    counter = <span class="hljs-keyword">await</span> handleUnconditionalWrites(momentoCacheClient,cacheName, id);
 }
</code></pre>
<p>As you can see, i wrote two different methods to handle conditional and unconditional writes.</p>
<p>Let’s dive into the simpler one, <em>handleUnconditionalWrites</em>:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">handleUnconditionalWrites</span>(<span class="hljs-params">momentoClient: CacheClient,cacheName: <span class="hljs-built_in">string</span>, id: <span class="hljs-built_in">string</span></span>) </span>{
  <span class="hljs-keyword">let</span> counter = <span class="hljs-number">0</span>;

  <span class="hljs-keyword">const</span> cacheIncrementResponse = <span class="hljs-keyword">await</span> momentoClient.increment(cacheName, id, <span class="hljs-number">1</span>);
  <span class="hljs-keyword">switch</span> (cacheIncrementResponse.type) {
    <span class="hljs-keyword">case</span> CacheIncrementResponse.Success:
      counter = cacheIncrementResponse.value();
      <span class="hljs-keyword">break</span>;
    <span class="hljs-keyword">case</span> CacheIncrementResponse.Error:
      <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(cacheIncrementResponse.message());
  }

  <span class="hljs-keyword">return</span> counter
}
</code></pre>
<p>The method simply levearage on the <em>increment</em> method from momento sdk.</p>
<p>It increments a key value by an integer (one, in this example) regardless key existence or key current value.</p>
<p>Things get more interesting when it comes to handle conditional writes to increment the counter only if it is below a specified threshold.</p>
<p>Momento does not provide any conditional increment method, but provides few useful conditional writes methods such <em>setIfPresentAndNotEqual</em> and <em>setIfAbsent.</em></p>
<p>Let’s dive into <em>handleContionalWrites</em> implementation (response handling logic is removed better readability):</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">handleConditionalWrites</span>(<span class="hljs-params">momentoClient: CacheClient, cacheName: <span class="hljs-built_in">string</span>, id: <span class="hljs-built_in">string</span>, maxCounterValue: <span class="hljs-built_in">string</span></span>)</span>{

  <span class="hljs-keyword">let</span> counter = <span class="hljs-number">0</span>;

  <span class="hljs-keyword">const</span> cacheGetResponse = <span class="hljs-keyword">await</span> momentoClient.get(cacheName, id);

  <span class="hljs-keyword">switch</span> (cacheGetResponse.type) {
    <span class="hljs-keyword">case</span> CacheGetResponse.Hit:
      <span class="hljs-keyword">const</span> currentCounter = <span class="hljs-built_in">Number</span>(cacheGetResponse.value());
      <span class="hljs-keyword">const</span> nextCounter =  currentCounter + <span class="hljs-number">1</span>;
      <span class="hljs-keyword">const</span> strNextCounter = nextCounter.toString();

      counter = <span class="hljs-keyword">await</span> handleSetIfPresentAndNotEqual(momentoClient,cacheName, id, strNextCounter, maxCounterValue);
      <span class="hljs-keyword">break</span>;  
    <span class="hljs-keyword">case</span> CacheGetResponse.Miss:
      counter = <span class="hljs-keyword">await</span> hanldeSetIfAbsent(momentoClient, cacheName, id, <span class="hljs-string">'1'</span>);
      <span class="hljs-keyword">break</span>;
    <span class="hljs-keyword">case</span> CacheGetResponse.Error:
      <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(cacheGetResponse.toString()); 
  }

  <span class="hljs-keyword">return</span> counter
}

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">handleSetIfPresentAndNotEqual</span>(<span class="hljs-params">momentoClient: CacheClient,cacheName: <span class="hljs-built_in">string</span>, id: <span class="hljs-built_in">string</span>, nextCounter: <span class="hljs-built_in">string</span>, maxCounterValue: <span class="hljs-built_in">string</span></span>) </span>{

  <span class="hljs-keyword">const</span> cacheSetIfPresentAndNotEqualResponse = <span class="hljs-keyword">await</span> momentoClient.setIfPresentAndNotEqual(cacheName, id, nextCounter, maxCounterValue);
  ...
}

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">hanldeSetIfAbsent</span>(<span class="hljs-params">momentoClient: CacheClient,cacheName: <span class="hljs-built_in">string</span>, id: <span class="hljs-built_in">string</span>, value: <span class="hljs-built_in">string</span></span>) </span>{
  <span class="hljs-keyword">let</span> counter = <span class="hljs-number">0</span>;
  <span class="hljs-keyword">const</span> setIfAbsentResponse = <span class="hljs-keyword">await</span> momentoClient.setIfAbsent(cacheName, id, value);
  ...
}
</code></pre>
<p>These methods perform the following logic:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736257451379/02b2860e-893d-459e-9f31-776163cec648.png" alt class="image--center mx-auto" /></p>
<p>and this ensures consistency, since race conditions are checked by the two check-and-set methods on server-side.</p>
<p>This is how key initialization works (<em>set if not present</em> branch):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736245870347/f55ef434-e011-4de1-85fd-e9022fa5bb3e.png" alt class="image--center mx-auto" /></p>
<p>and this is how an existing key increment works (<em>set if present and not equals</em> branch):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736245944886/ddf6c0c1-fd30-4134-aa68-36600c849d0a.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-trade-offs-and-conclusion">Trade-Offs and conclusion</h1>
<h2 id="heading-strenghts">Strenghts:</h2>
<ul>
<li><p><strong>Serverless Simplicity</strong>: No infrastructure to manage, reducing operational overhead.</p>
</li>
<li><p><strong>Built-In Scalability</strong>: Automatically scales to meet demand without manual intervention.</p>
</li>
</ul>
<h2 id="heading-limitations">Limitations:</h2>
<ul>
<li><p><strong>Reduced Control</strong>: Lack of visibility into replication and cluster configurations.</p>
</li>
<li><p><strong>Eventual Consistency</strong>: While atomic operations are supported, consistency guarantees may differ in highly distributed setups.</p>
</li>
<li><p>Performance: since an initial GET is required, more network trips are required compared to other solution that implements conditional increments.</p>
</li>
</ul>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Momento showcases how a <strong>serverless-first approach</strong> simplifies distributed caching.</p>
<p>By eliminating the need to manage infrastructure, it allows developers to focus on building applications rather than worrying about operational overhead.</p>
<p>For atomic counters, Momento’s increment operation makes implementation straightforward and reliable. However, this convenience comes with trade-offs: you lose the granular control over replication and failover configurations that traditional systems like Redis offer.</p>
<p>If you’re exploring distributed counters for your application, I highly recommend trying out the example provided in the <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">GitHub repository</a>.</p>
<p>Stay tuned for the next installment in the series, where we’ll delve into <strong>DocumentDB</strong>,</p>
]]></content:encoded></item><item><title><![CDATA[Building Atomic Counters with Elasticache Redis]]></title><description><![CDATA[When working with high-throughput, low-latency applications, Redis—an in-memory data store—stands out as an excellent choice for implementing the atomic counter pattern.
With its atomic operations and simple APIs, Redis offers a straightforward appro...]]></description><link>https://haveyoutriedrestarting.com/building-atomic-counters-with-elasticache-redis</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/building-atomic-counters-with-elasticache-redis</guid><category><![CDATA[serverless]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Databases]]></category><category><![CDATA[Redis]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[distributed system]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Sun, 22 Dec 2024 16:51:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/gdL-UZfnD3I/upload/1ab757f37fb175ede5dfb1e78f04fb41.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When working with high-throughput, low-latency applications, <strong>Redis</strong>—an in-memory data store—stands out as an excellent choice for implementing the <strong>atomic counter pattern</strong>.</p>
<p>With its atomic operations and simple APIs, Redis offers a straightforward approach to incrementing counters while ensuring high performance.</p>
<p>In this article, we’ll explore how to build an atomic counter using <strong>AWS ElastiCache Redis</strong>.</p>
<p>You’ll gain a practical understanding of Redis concepts like <strong>atomic operations</strong>, its <strong>replication model</strong>, and how to implement counters with the code from <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">this GitHub repository</a>.</p>
<h2 id="heading-serializability-linearizability-and-redis"><strong>Serializability, Linearizability, and Redis</strong></h2>
<p>Before we can dive deep into code, we need to recall few concepts (please refer to <a target="_blank" href="https://hashnode.com/post/cm3syajxr000009mk7pwz56if"><strong>the first article of this series for a detailed explanation</strong>)</a>:</p>
<ul>
<li><p><strong>Serializability</strong>: Operations appear in a consistent sequential order, ensuring correctness.</p>
</li>
<li><p><strong>Linearizability</strong>: Writes are immediately visible for subsequent reads, ensuring real-time consistency.</p>
</li>
</ul>
<p>Redis processes commands in a <strong>single-threaded event loop</strong>, ensuring that each command is executed in the order it’s received. This guarantees atomicity at the command level for operations like <em>INCR</em>.</p>
<p>While Redis operations on a single node can be considered <strong>linearizable</strong>, in a distributed Redis setup (e.g., with clustering or replicas), this strict ordering can break. Writes to replicas are propagated asynchronously, so they may lag behind the primary node.</p>
<h2 id="heading-replication-and-leader-election-in-redis"><strong>Replication and Leader election in Redis</strong></h2>
<p>Redis employs a <strong>primary-replica architecture</strong>, where:</p>
<ul>
<li><p>The <strong>primary node</strong> handles all writes and propagates updates to replicas asynchronously.</p>
</li>
<li><p><strong>Redis Sentinel</strong> handles failover, promoting a replica to primary in case of failure.</p>
</li>
</ul>
<p>For atomic counters, a single primary node is typically sufficient. If clustering is used, counter keys should be kept on a single shard to maintain atomicity.</p>
<h2 id="heading-the-atomic-counter-pattern">The Atomic Counter Pattern</h2>
<p>The <strong>atomic counter pattern</strong> allows you to increment a value reliably, even in distributed systems, by ensuring operations are conflict-free and consistent.</p>
<p>Redis supports this pattern natively through the <em>INCR</em> command, which atomically increments a key’s value by 1.</p>
<p>However, if it is necessary to carry out the increment depending on the current status of the counter, race conditions may be possible.</p>
<h2 id="heading-hands-on-walkthrough-of-the-deployable-example"><strong>Hands-on! Walkthrough of the Deployable Example</strong></h2>
<p>Let’s dive into the example provided in the <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">GitHub repository</a>.</p>
<p>This example demonstrates how to implement an atomic counter using <strong>AWS Lambda</strong>, <strong>API Gateway</strong>, and <strong>ElastiCache Redis</strong>.</p>
<p>1. <strong>API Gateway</strong>: Provides HTTP endpoints for interacting with the counter.</p>
<p>2. <strong>Lambda Functions</strong>: Implements the business logic for incrementing the counter.</p>
<p>3. <strong>ElastiCache Redis</strong> Cluster: Stores the counters with atomicity guarantees.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736242960240/fdb98bce-e910-49d4-95d8-ee214a63342f.png" alt class="image--center mx-auto" /></p>
<p>In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use or not conditional writes.</p>
<p>Let’s focus on lambda business logic, <a target="_blank" href="https://github.com/ncremaschini/atomic-counter/blob/main/lib/lambda/redis/index.ts">from the redisAtomicCounter Lambda</a> code:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> redisClient = <span class="hljs-keyword">await</span> buildRedisClient();

<span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> redisClient.eval(getLuaScript(useConditionalWrites), <span class="hljs-number">1</span>, id, maxCounterValue);
</code></pre>
<p>This code snippet simply sends an <em>eval</em> command and gets the new updated counter value, using the <a target="_blank" href="https://github.com/redis/ioredis">ioredis client</a>.</p>
<p>Le’t see how the <em>getLuaScript()</em> works:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> getLuaScript = <span class="hljs-function">(<span class="hljs-params">useConditionalWrites: <span class="hljs-built_in">boolean</span></span>) =&gt;</span> {

  <span class="hljs-keyword">const</span> unconditionalIncrementScript = <span class="hljs-string">`
    redis.call('INCR', KEYS[1])
    local counter = redis.call('GET', KEYS[1])
    return counter
   `</span>;

  <span class="hljs-keyword">const</span> conditionalIncrementScript = <span class="hljs-string">`

    local counter = redis.call('GET', KEYS[1])
    local maxValue = tonumber(ARGV[1])

    if not counter then
      counter = 0
    end

    counter = tonumber(counter)

    if counter &lt; maxValue then
      redis.call('INCR', KEYS[1])
      counter = redis.call('GET', KEYS[1])
      return counter
    else
      return 'Counter has reached its maximum value of: ' .. maxValue
    end
  `</span>;

  <span class="hljs-keyword">return</span> useConditionalWrites ? conditionalIncrementScript : unconditionalIncrementScript;
}
</code></pre>
<p>Unconditional writes don't require actually to be executed inside a LUA Script: we could use the <em>incr()</em> method provided by <em>ioredis</em> client.</p>
<p>But for conditional writes, it is required to check the counter value before incrementing it to avoid race conditions: if we perform the check on client side, another client might increment the counter between the <em>get()</em> and the <em>incr()</em> instructions execution on the first client.</p>
<p>Let's see an example: assuming the maximum value for the counter is 10, Alice and Bob perform a <em>GET</em> for the same key 1, when the counter value is 9.</p>
<p>They check if the current value is below the maximum value, and then they both send an <em>INC</em> command to redis.</p>
<p>Since <em>INC</em> command is executed unconditionally on server side, the counter is incremented by two and exceeds the maximum value.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736243855622/65addd38-7704-4592-89b5-a40275a9183c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-redis-secret-sauce-lua-script">Redis secret sauce: LUA Script</h2>
<p>The solution is to check the counter value on server side, executing a LUA Script.</p>
<p>It gets the counter value, check if it is below the maximum value and eventually increment it:</p>
<pre><code class="lang-julia">
<span class="hljs-keyword">local</span> counter = redis.call('GET', KEYS[<span class="hljs-number">1</span>])
<span class="hljs-keyword">local</span> maxValue = tonumber(ARGV[<span class="hljs-number">1</span>])

<span class="hljs-keyword">if</span> not counter then
    counter = <span class="hljs-number">0</span>
<span class="hljs-keyword">end</span>

counter = tonumber(counter)

<span class="hljs-keyword">if</span> counter &lt; maxValue then
    redis.call('INCR', KEYS[<span class="hljs-number">1</span>])
    counter = redis.call('GET', KEYS[<span class="hljs-number">1</span>])
    <span class="hljs-keyword">return</span> counter
<span class="hljs-keyword">else</span>
    <span class="hljs-keyword">return</span> 'Counter has reached its maximum value of: ' .. maxValue
<span class="hljs-keyword">end</span>
</code></pre>
<p>If you look the code, you might think this code is not safe too: another script could increment the counter between the <em>GET</em> and the <em>INCR</em> command execution.</p>
<p>The magic of LUA Script is that just one script can be executed at the same time, <a target="_blank" href="https://redis.io/docs/latest/develop/interact/programmability/eval-intro/">as reported in the documentation</a>:</p>
<blockquote>
<p>Redis guarantees the script's atomic execution. While executing the script, all server activities are blocked during its entire runtime. These semantics mean that all of the script's effects either have yet to happen or had already happened.</p>
</blockquote>
<p>and that is exactly what we need when dealing with atomic counters: Since only one script is executed, there is no concurrency and no race conditions, as <strong>serializability is</strong> guaranteed.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736244214122/afba5863-5292-4d2f-8fc3-6627324a060d.png" alt class="image--center mx-auto" /></p>
<p>Moreover, we have better performance, which is good:</p>
<blockquote>
<p>Because scripts execute in the server, reading and writing data from scripts is very efficient.</p>
</blockquote>
<h2 id="heading-trade-offs-and-conclusion"><strong>Trade-Offs and Conclusion</strong></h2>
<h3 id="heading-strengths"><strong>Strengths</strong>:</h3>
<ul>
<li><p>Redis provides <strong>low-latency</strong> atomic operations.</p>
</li>
<li><p>The <em>INCR</em> command is inherently atomic, simplifying counter implementation.</p>
</li>
<li><p>LUA Script execution prevent race conditions on conditional writes, achieving <strong>serializability.</strong></p>
</li>
</ul>
<h3 id="heading-limitations"><strong>Limitations</strong>:</h3>
<ul>
<li><p><strong>Asynchronous Replication</strong>: Updates may not immediately reflect on replicas.</p>
</li>
<li><p><strong>Durability Risks</strong>: Without persistence, counters may reset after a failure or restart.</p>
</li>
</ul>
<h3 id="heading-conclusion"><strong>Conclusion:</strong></h3>
<p>Redis is ideal for high-performance, in-memory atomic counters where latency is a top priority, and simplifies building atomic counters with its native support for atomic operations and low-latency access.</p>
<p>However, consider its replication and durability trade-offs for production use.</p>
<p>Check out the <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">deployable example here</a>, and stay tuned for the next article in the series, where we’ll explore <strong>Momento</strong> as an alternative for serverless caching.</p>
]]></content:encoded></item><item><title><![CDATA[Building Atomic Counters with DynamoDB]]></title><description><![CDATA[DynamoDB, a serverless NoSQL database, is a go-to choice for implementing atomic counters due to its built-in support for atomic operations and managed scalability. This article will guide you through how DynamoDB ensures consistency and replication,...]]></description><link>https://haveyoutriedrestarting.com/building-atomic-counters-with-dynamodb</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/building-atomic-counters-with-dynamodb</guid><category><![CDATA[DynamoDB]]></category><category><![CDATA[AWS]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[Databases]]></category><category><![CDATA[serverless]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Mon, 09 Dec 2024 06:00:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/OgvqXGL7XO4/upload/d763713e77e8ef537877c4d43f62c09b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>DynamoDB, a serverless NoSQL database, is a go-to choice for implementing atomic counters due to its built-in support for atomic operations and managed scalability. This article will guide you through how DynamoDB ensures consistency and replication, a refresher on the atomic counter pattern, and a hands-on walkthrough of a deployable example from <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">this GitHub repository</a>.</p>
<p>By the end, you’ll understand how to leverage DynamoDB for atomic counters and know the trade-offs involved.</p>
<h2 id="heading-serializability-and-linearizability-in-dynamodb"><strong>Serializability and Linearizability in DynamoDB</strong></h2>
<p>Before we can dive deep into code, we need to recall few concepts (please refer to <a target="_blank" href="https://hashnode.com/post/cm3syajxr000009mk7pwz56if">the first article of this series for a detailed explanation</a>)</p>
<ul>
<li><p><strong>Serializability</strong>: Operations appear in a consistent sequential order, ensuring correctness.</p>
</li>
<li><p><strong>Linearizability</strong>: Writes are immediately visible for subsequent reads, ensuring real-time consistency.</p>
</li>
</ul>
<p>DynamoDB achieves linearizable writes through its <strong>single-leader replication model</strong>:</p>
<ul>
<li><p>Write operations are directed to the leader node for the partition key, and changes are propagated to replicas.</p>
</li>
<li><p>Strongly consistent reads (optional) ensure the latest value is returned immediately after a write.</p>
</li>
</ul>
<p>This guarantees DynamoDB can safely implement atomic operations like counters.</p>
<h2 id="heading-replication-and-leader-election-in-dynamodb">Replication and Leader election in DynamoDB</h2>
<p>DynamoDB automatically manages replication across multiple availability zones to ensure durability and availability. Key mechanisms include:</p>
<p>• <strong>Single-leader replication</strong>: A leader node handles writes, maintaining consistency while replicas handle reads.</p>
<p>• <strong>Leader election</strong>: If a leader fails, DynamoDB promotes another replica seamlessly, ensuring high availability without manual intervention.</p>
<p>This replication strategy enables DynamoDB to handle distributed workloads while maintaining data consistency.</p>
<h3 id="heading-a-note-on-synchronized-timestamps-in-distributed-databases"><strong>A Note on Synchronized Timestamps in Distributed Databases</strong></h3>
<p>Synchronized timestamps play a critical role in distributed databases, especially for ensuring consistency across geographically dispersed replicas.</p>
<p>Without synchronized clocks, it becomes challenging to determine the order of operations accurately, leading to potential consistency issues in global-scale applications.</p>
<p>In the AWS ecosystem, the <strong>AWS Time Sync Service</strong> provides a highly accurate and reliable time source synchronized across all AWS Regions.</p>
<p><a target="_blank" href="https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-time-sync-service-microsecond-accurate-time/">Announced last year</a>, this service offers nanosecond-level precision and a consistent view of time, serving as a foundational piece for distributed systems.</p>
<p>Recently, AWS built upon this foundation to announce <a target="_blank" href="https://press.aboutamazon.com/2024/12/aws-announces-new-database-capabilities-including-amazon-aurora-dsql-the-fastest-distributed-sql-database#:~:text=To%20ensure%20each%20Region%20sees,provide%20microseconds%20level%20accurate%20time"><strong>strong consistency for DynamoDB global tables</strong></a>. This new feature allows applications to perform strongly consistent reads and writes across multiple regions, ensuring the same data is visible no matter where the query originates.</p>
<p><strong>Why is this important?</strong> Strong consistency in global tables depends on synchronized timestamps to ensure that write propagation across regions respects causal ordering. This prevents race conditions and ensures data correctness even in high-latency or failure scenarios.</p>
<p><strong>Impact on Atomic Counters</strong>: If your atomic counter spans multiple regions via global tables, synchronized timestamps enable accurate propagation of updates, preserving the order and integrity of increments.</p>
<p>This synergy of the AWS Time Sync Service and DynamoDB advancements showcases how synchronized time is more than just an infrastructure detail—it’s a cornerstone of achieving robust distributed consistency.</p>
<h2 id="heading-dynamodb-conditional-writes">DynamoDB Conditional Writes</h2>
<p>DynamoDB’s <strong>conditional write</strong> feature allows you to execute write operations (<em>PutItem, UpdateItem, DeleteItem</em>) only if specific conditions are met.</p>
<p>This capability is crucial for enforcing business rules, ensuring data integrity, and preventing race conditions in distributed systems.</p>
<p>When you perform a conditional write, you include a <em>ConditionExpression</em> in the request.</p>
<p>DynamoDB evaluates this condition against the item’s existing attributes before executing the operation:</p>
<ul>
<li><p>If the condition evaluates to <strong>true</strong>, the write operation proceeds.</p>
</li>
<li><p>If the condition evaluates to <strong>false</strong>, the operation fails with a <em>ConditionalCheckFailedException</em>.</p>
</li>
</ul>
<p>Few use cases for conditional writes includes:</p>
<h3 id="heading-enforcing-constraints">Enforcing constraints</h3>
<ol>
<li>Ensure unique records in a table by verifying an attribute does not exist:</li>
</ol>
<pre><code class="lang-plaintext">ConditionExpression: "attribute_not_exists(partitionKey)"
</code></pre>
<ol start="2">
<li>Prevent counter increments beyond a maximum value:</li>
</ol>
<pre><code class="lang-plaintext">ConditionExpression: "counterValue &lt; :maxValue"
</code></pre>
<h3 id="heading-concurrent-updates-without-conflicts"><strong>Concurrent Updates without conflicts</strong></h3>
<p>Safely update an item only if its version matches a known value (optimistic locking):</p>
<pre><code class="lang-plaintext">ConditionExpression: "version = :expectedVersion"
</code></pre>
<h3 id="heading-transactional-integrity"><strong>Transactional Integrity</strong></h3>
<p>enforce rules like “only update if another attribute matches a specific state.”</p>
<h3 id="heading-key-benefits">Key Benefits:</h3>
<ol>
<li><p><strong>Atomicity:</strong></p>
<p> Conditional writes ensure atomic operations by evaluating and writing in a single step, reducing the need for complex locking mechanisms.</p>
</li>
<li><p><strong>Data Integrity:</strong></p>
<p> Prevent unintended overwrites or updates by applying conditions based on the item’s current state.</p>
</li>
<li><p><strong>Performance:</strong></p>
<p> Conditional expressions are evaluated directly on the DynamoDB service, minimizing latency and avoiding additional queries to check conditions beforehand.</p>
</li>
</ol>
<h2 id="heading-the-atomic-counter-pattern">The atomic counter pattern</h2>
<p>The atomic counter pattern ensures safe, concurrent updates to a counter without losing increments due to race conditions. In distributed systems:</p>
<p>• Operations must be atomic (all or nothing).</p>
<p>• DynamoDB achieves this with the <em>UpdateItem</em> operation and the ADD attribute update expression, which ensures the counter is incremented atomically.</p>
<h2 id="heading-hands-on-walkthrough-of-the-deployable-example">Hands-on! <strong>Walkthrough of the Deployable Example</strong></h2>
<p>Let’s explore the inner workings of the example in the <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">GitHub repositor</a>y, focusing on how DynamoDB is used for atomic counters.</p>
<p>The implementation includes:</p>
<p>1. <strong>API Gateway</strong>: Provides HTTP endpoints for interacting with the counter.</p>
<p>2. <strong>Lambda Functions</strong>: Implements the business logic for incrementing the counter and enforcing optional constraints like a maximum value.</p>
<p>3. <strong>DynamoDB Table</strong>: Stores the counters with atomicity guarantees.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736242686190/1d1219a2-940e-4956-a0b2-d595e0156e60.png" alt class="image--center mx-auto" /></p>
<p>In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use ore not conditional writes.</p>
<p>Let’s focus on this logic,<a target="_blank" href="https://github.com/ncremaschini/atomic-counter/blob/main/lib/lambda/dynamo/index.ts">from the dynamoDbAtomicCounter Lambda</a> code:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> id = event.pathParameters?.id;

<span class="hljs-keyword">const</span> writeParams = getWriteParams(useConditionalWrites, id, maxCounterValue);

<span class="hljs-keyword">const</span> dynamoDBClient = <span class="hljs-keyword">await</span> buildDynamoDbClient();

<span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> dynamoDBClient.send(<span class="hljs-keyword">new</span> UpdateItemCommand(writeParams));

<span class="hljs-keyword">const</span> counter = <span class="hljs-built_in">Number</span>(result.Attributes?.atomic_counter.N);
</code></pre>
<p>This code snippet simply sends an <em>UpdateItemCommand</em> and gets the new updated counter value, using the AWS SDK DynamoDBClient.</p>
<p>Le’t see how the <em>getWriteParams</em> works:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> getWriteParams = <span class="hljs-function">(<span class="hljs-params">useConditionalWrites: <span class="hljs-built_in">boolean</span>, id: <span class="hljs-built_in">string</span>, maxCounterValue: <span class="hljs-built_in">string</span></span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> TABLE_NAME = process.env.TABLE_NAME || <span class="hljs-string">''</span>;

  <span class="hljs-keyword">const</span> unconditionalWriteParams = {
    TableName: TABLE_NAME,
    Key: {
      id: { S: id },
    },
    UpdateExpression: <span class="hljs-string">'ADD atomic_counter :inc'</span>,
    ExpressionAttributeValues: {
      <span class="hljs-string">':inc'</span>: { N: <span class="hljs-string">'1'</span> }
    },
    ReturnValues: <span class="hljs-string">'UPDATED_NEW'</span> <span class="hljs-keyword">as</span> <span class="hljs-keyword">const</span>,
  };

  <span class="hljs-keyword">const</span> conditionalWriteParams = {
    TableName: TABLE_NAME,
    Key: {
      id: { S: id },
    },
    UpdateExpression: <span class="hljs-string">'ADD atomic_counter :inc'</span>,
    ConditionExpression: <span class="hljs-string">'attribute_not_exists(atomic_counter) or atomic_counter &lt; :max'</span>,
    ExpressionAttributeValues: {
      <span class="hljs-string">':inc'</span>: { N: <span class="hljs-string">'1'</span> },
      <span class="hljs-string">':max'</span>: { N: maxCounterValue },
    },
    ReturnValues: <span class="hljs-string">'UPDATED_NEW'</span> <span class="hljs-keyword">as</span> <span class="hljs-keyword">const</span>,
  };

  <span class="hljs-keyword">return</span> useConditionalWrites ? conditionalWriteParams : unconditionalWriteParams;
}
</code></pre>
<p>This method checks if conditional writes are required: the <em>update expression</em> is the same in both cases, and it leverages the <em>ADD</em> command.</p>
<pre><code class="lang-typescript">UpdateExpression: <span class="hljs-string">'ADD atomic_counter :inc'</span>
</code></pre>
<p>If conditional writes are required, a <em>Condition Expression</em> is used:</p>
<pre><code class="lang-typescript">ConditionExpression: <span class="hljs-string">'attribute_not_exists(atomic_counter) or atomic_counter &lt; :max'</span>
</code></pre>
<h2 id="heading-trade-offs-and-conclusion">Trade-Offs and Conclusion</h2>
<h3 id="heading-advantages">Advantages</h3>
<ul>
<li><p>DynamoDB’s managed infrastructure handles replication and scaling.</p>
</li>
<li><p>Simple and efficient atomic updates using <em>UpdateItem</em>.</p>
</li>
</ul>
<h3 id="heading-limitations"><strong>Limitations</strong></h3>
<ul>
<li><p><strong>Hot partitions</strong>: High traffic to a single counter may cause throttling.</p>
</li>
<li><p><strong>Throughput limits</strong>: Monitor RCUs/WCUs to avoid performance degradation.</p>
</li>
<li><p><strong>Eventual consistency</strong>: Use strongly consistent reads when precise counter values are critical.</p>
</li>
</ul>
<h3 id="heading-conclusion">Conclusion</h3>
<p>DynamoDB provides a simple and scalable solution for implementing atomic counters in distributed systems.</p>
<p>Its built-in atomicity and managed replication make it a strong candidate for this pattern.</p>
<p>Explore the <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">GitHub repository</a> <a target="_blank" href="https://github.com/ncremaschini/atomic-counter">to deploy the example</a> and experiment with atomic counters in DynamoDB, and stay tuned for the next article, <a target="_blank" href="https://haveyoutriedrestarting.com/building-atomic-counters-with-elasticache-redis">where we’ll explore the <strong>ElastiCache Redis implementation</strong> of the atomic counter pattern.</a></p>
]]></content:encoded></item><item><title><![CDATA[Atomic counter: framing the Problem Space]]></title><description><![CDATA[Why Atomic Counters Matter in Distributed Systems

In distributed systems, ensuring accuracy and consistency in concurrent operations is a core challenge. Atomic counters—a mechanism for maintaining precise, incrementing counts—are a common requireme...]]></description><link>https://haveyoutriedrestarting.com/atomic-counter-framing-the-problem-space</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/atomic-counter-framing-the-problem-space</guid><category><![CDATA[serverless]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Databases]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[Distributed Database]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Fri, 22 Nov 2024 16:23:29 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/jrKKj9nJMxM/upload/7ec4b67587ff34d3e5a636d165164e97.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Why Atomic Counters Matter in Distributed Systems</p>
<hr />
<p>In distributed systems, ensuring accuracy and consistency in concurrent operations is a core challenge. Atomic counters—a mechanism for maintaining precise, incrementing counts—are a common requirement in applications like:</p>
<ul>
<li><p><strong>Rate Limiting</strong>: Tracking API usage to enforce quotas.</p>
</li>
<li><p><strong>Inventory Management</strong>: Keeping stock levels accurate in real time.</p>
</li>
<li><p><strong>Leaderboards</strong>: Recording scores and ranks in games or applications.</p>
</li>
<li><p><strong>Analytics</strong>: Counting events such as clicks or views for reporting.</p>
</li>
</ul>
<h2 id="heading-the-challenge-scaling-atomicity-in-distributed-systems"><strong>The Challenge: Scaling Atomicity in Distributed Systems</strong></h2>
<p>When multiple processes update a shared counter, ensuring accuracy without conflicts is difficult. Challenges include:</p>
<ul>
<li><p><strong>Race Conditions</strong>: Concurrent updates may result in incorrect counts.</p>
</li>
<li><p><strong>Data Integrity</strong>: Systems must ensure updates are not lost, even in failure scenarios.</p>
</li>
<li><p><strong>Scalability vs. Consistency</strong>: Distributed systems trade off latency, fault tolerance, and strict consistency.</p>
</li>
</ul>
<p>This trade-off is encapsulated in the <strong>CAP theorem</strong>, which states that a distributed database can only guarantee two of the following three properties:</p>
<ul>
<li><p><strong>Consistency</strong>: Every read reflects the most recent write.</p>
</li>
<li><p><strong>Availability</strong>: Every request receives a response, even if some nodes are down.</p>
</li>
<li><p><strong>Partition Tolerance</strong>: The system operates even when network partitions occur.</p>
</li>
</ul>
<p>Atomic counters live at the intersection of these challenges. For example:</p>
<ul>
<li><p>Choosing <strong>consistency and partition tolerance</strong> ensures correctness but may sacrifice availability during failures.</p>
</li>
<li><p>Prioritizing <strong>availability and partition tolerance</strong> may allow stale or conflicting updates.</p>
</li>
</ul>
<h2 id="heading-serializability-and-linearizability"><strong>Serializability and Linearizability</strong></h2>
<p>Atomic counters require precise semantics to maintain correctness:</p>
<ul>
<li><p><strong>Serializability</strong> ensures that concurrent operations are executed in a sequence that could occur in a single-threaded system. It’s the gold standard for consistency in databases but can be computationally expensive.</p>
</li>
<li><p><strong>Linearizability</strong>, a stronger guarantee, ensures that operations appear instantaneous and reflect the latest state globally. This is crucial for atomic counters where every increment must reflect an up-to-date value.</p>
</li>
</ul>
<h2 id="heading-why-these-databases"><strong>Why These Databases?</strong></h2>
<p>For this series, I’ve chosen <a target="_blank" href="https://aws.amazon.com/dynamodb/">DynamoDB</a><strong>,</strong> <a target="_blank" href="https://aws.amazon.com/it/documentdb/">DocumentDB</a><strong>,</strong> <a target="_blank" href="https://aws.amazon.com/redis/">Elasticache Redis</a><strong>,</strong> <a target="_blank" href="https://www.gomomento.com/">GoMomento</a>, and <a target="_blank" href="https://pingcap.com/products/tidb/">TiDB</a> for several key reasons:</p>
<ol>
<li><p><strong>Serverless and SaaS Models</strong>: DynamoDB, DocumentDb and the SaaS version of TiDB handle infrastructure and scaling for you. Similarly, ElastiCache and Momento offer managed caching solutions, focusing on simplicity and performance.</p>
</li>
<li><p><strong>Diverse Strategies</strong>: These systems represent a variety of approaches to critical aspects of distributed systems:</p>
<ul>
<li><p><strong>Replication</strong>: How they replicate data across nodes to ensure fault tolerance.</p>
</li>
<li><p><strong>Leader Election</strong>: How they coordinate updates and ensure consistency in distributed setups.</p>
</li>
<li><p><strong>Consistency Models</strong>: The balance each system strikes between strict consistency and eventual consistency.</p>
</li>
</ul>
</li>
<li><p><strong>Specialized Solutions</strong>:</p>
<ul>
<li><p><strong>DynamoDB and DocumentDB</strong> excel as databases for durable, consistent storage.</p>
</li>
<li><p><strong>ElastiCache Redis and Momento</strong> shine in caching scenarios, where low-latency access is key.</p>
</li>
<li><p><strong>TiDB SaaS</strong> bridges SQL capabilities with distributed architecture, ideal for scenarios demanding a balance between transactional guarantees and scalability.</p>
</li>
</ul>
</li>
</ol>
<p>By comparing these systems, we’ll uncover insights into how different architectures tackle the shared challenge of atomicity, equipping you to make informed choices in your projects.</p>
<h2 id="heading-why-a-pattern-matters"><strong>Why a Pattern Matters</strong></h2>
<p>The atomic counter pattern provides structured solutions to navigate these complexities, leveraging the unique strengths of various databases and caching systems. By using native features such as conditional writes, Lua scripts, or distributed transactions, developers can:</p>
<ul>
<li><p>Ensure correctness under concurrent updates.</p>
</li>
<li><p>Balance consistency, availability, and scalability based on system needs.</p>
</li>
<li><p>Simplify implementation by relying on proven database capabilities.</p>
</li>
</ul>
<p>In this series, we’ll explore how to:</p>
<ol>
<li><p>Understand the trade-offs of implementing atomic counters in distributed environments.</p>
</li>
<li><p>Build practical solutions using <strong>Node.js</strong> and <strong>AWS CDK</strong>, supported by real-world examples.</p>
</li>
<li><p>Apply atomic counter patterns across databases like <strong>DynamoDB, Redis, TiDB</strong>, and SaaS services like <strong>Momento</strong>.</p>
</li>
</ol>
<p>Let’s set the stage for building reliable atomic counters with a strong foundation in distributed systems theory and practical implementations.</p>
<p><a target="_blank" href="https://github.com/ncremaschini/atomic-counter">Here's the github repository with deployable stack to explore the different implementations</a></p>
]]></content:encoded></item><item><title><![CDATA[Evaluating Performance: A Benchmark Study of Serverless Solutions for Message Delivery to Containers on AWS Cloud - Episode 2]]></title><description><![CDATA[This post follows my previous post on this topic, and it measures the performance of another solution for the same problem, how to forward events to private containers using serverless services and fan-out patterns.
Context
Suppose you have a cluster...]]></description><link>https://haveyoutriedrestarting.com/evaluating-performance-of-serverless-solutions-for-message-delivery-on-aws-ep-2</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/evaluating-performance-of-serverless-solutions-for-message-delivery-on-aws-ep-2</guid><category><![CDATA[fan out]]></category><category><![CDATA[AWS]]></category><category><![CDATA[serverless]]></category><category><![CDATA[ECS]]></category><category><![CDATA[aws-fargate]]></category><category><![CDATA[AWS EventBridge]]></category><category><![CDATA[DynamoDB]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Fri, 10 May 2024 13:25:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ObweQkF5w30/upload/93d10777cdfea4f77c17ec6c38c9b33b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post follows <a target="_blank" href="https://haveyoutriedrestarting.com/evaluating-performance-a-benchmark-study-of-serverless-solutions-for-message-delivery-to-containers-on-aws-cloud">my previous post on this topic</a>, and it measures the performance of another solution for the same problem, <strong>how to forward events to private containers using serverless services and fan-out patterns.</strong></p>
<h2 id="heading-context">Context</h2>
<p>Suppose you have a cluster of containers and you need to notify them when a database record is inserted or changed, and these changes apply to the internal state of the application. A fairly common use case.</p>
<p>Let's say you have the following requirements:</p>
<ul>
<li><p>The tasks are in an autoscaling group, so their number may change over time.</p>
</li>
<li><p>A task is only healthy if it can be updated when the status changes. In other words, all tasks must have the same status. Containers that do not change their status must be marked as unhealthy and replaced.</p>
</li>
<li><p>When a new task is started, it must be in the last known status.</p>
</li>
<li><p>Status changes must be in near real- time. Status changes in the database must be passed on to the containers in less than 2 seconds.</p>
</li>
</ul>
<h2 id="heading-solutions">Solutions</h2>
<p>In the <a target="_blank" href="https://haveyoutriedrestarting.com/evaluating-performance-a-benchmark-study-of-serverless-solutions-for-message-delivery-to-containers-on-aws-cloud">first post about this</a> i explored two options and measured performance of this one:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1715345902732/dc354ddc-2aba-48a5-9f0c-ec197aad66bf.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p>The AppSync API receives mutations and stores derived data in the DynamoDB table</p>
</li>
<li><p>The DynamoDB streams the events</p>
</li>
<li><p>The Lambda function is triggered by the DynamoDB stream</p>
</li>
<li><p>The Lambda function sends the events to the SNS topic</p>
</li>
<li><p>The SNS topic sends the events to the SQS queues</p>
</li>
<li><p>The Fargate service reads the events from the SQS queues</p>
</li>
<li><p>If events are not processed within a timeout, they are moved to the DLQ</p>
</li>
<li><p>A Cloudwatch alarm is triggered if the DLQ is not empty</p>
</li>
</ol>
<h3 id="heading-the-even-more-serverless-version">The even more serverless version</h3>
<p>An even more serverless version of the above solution replaces Lambda and SNS with Eventbridge</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1715345130561/1850fc36-77b4-4725-8829-af76b7fff366.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p>The AppSync API receives mutations and stores derived data in the DynamoDB table</p>
</li>
<li><p>The DynamoDB stream the events</p>
</li>
<li><p>EventBridge is used to filter, transform and...</p>
</li>
<li><p>...fan-outs events to SQS queues</p>
</li>
<li><p>The Fargate service reads the events from the SQS queues</p>
</li>
<li><p>If events are not processed within a timeout, they are moved to the DLQ</p>
</li>
<li><p>A Cloudwatch alarm is triggered if the DLQ is not empty</p>
</li>
</ol>
<p><em>The only code i wrote here is the code to consume SQS from my application, no glue-code is required.</em></p>
<h2 id="heading-trust-but-verify">Trust, but verify</h2>
<p>I've conducted a benchmark to verify the performance of this configuration, in terms of latency from the mutation being posted to Appsync to the message received by the client polling SQS.</p>
<h3 id="heading-key-system-parameters">Key system parameters</h3>
<ul>
<li><p>Region: eu-south-1</p>
</li>
<li><p>Number of tasks: 20</p>
</li>
<li><p>Event bus: 1 SQS per task, 1 DLQ per SQS, all SQS subscribed to one SNS</p>
</li>
<li><p>SQS Consumer: provided by AWS SDK, configured for long polling (20s)</p>
</li>
<li><p>Task configuration: 256 CPU, 512 Memory, Docker image based on <a target="_blank" href="https://hub.docker.com/layers/library/node/20-slim/images/sha256-80c3e9753fed11eee3021b96497ba95fe15e5a1dfc16aaf5bc66025f369e00dd?context=explore"><strong>Official Node Image 20-slim</strong></a></p>
</li>
<li><p>DynamoDB Configured in PayPerUseMode, stream enabled</p>
</li>
<li><p>EventBridge configured to intercept and forwards all events from Dynamo stream to SQS queues</p>
</li>
</ul>
<h3 id="heading-benchmark-parameters">Benchmark parameters</h3>
<p>I used a basic postman collection runner to perform a mutation to Appsync every 5 seconds, for 720 iterations.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1715346159540/388312f1-2fff-4da0-a809-b1ec05e2d26b.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-goal">Goal</h3>
<p>The goal was to verify if containers would be updated within 2 seconds, and to verify performance against <a target="_blank" href="https://haveyoutriedrestarting.com/evaluating-performance-a-benchmark-study-of-serverless-solutions-for-message-delivery-to-containers-on-aws-cloud">the first version</a>.</p>
<h3 id="heading-measurements">Measurements</h3>
<p>i used the following Cloudwatch provided metrics:</p>
<ul>
<li><p>Appsync latency</p>
</li>
<li><p>Dynamo stream latency</p>
</li>
<li><p>EventBridge Pipe duration</p>
</li>
<li><p>EventBridge Rules latency</p>
</li>
</ul>
<p>The SQS time taken custom metric is calculated from SQS provided attributes.</p>
<h3 id="heading-results">Results</h3>
<p><em>Disclaimer: some latency measurements are calculated on consumers' side, and we all know that synchronizing clocks in a distributed system is a hard problem.</em></p>
<p><em>Still, measurements are performed by the same computing nodes.</em></p>
<p><em>Please consider following latencies not as precise measurements but as coarse indicators.</em></p>
<p>Here screenshots from my Cloudwatch dashboard</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1715346550027/8998f0ec-c31a-494e-8e13-3ef113042c08.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1715346567264/4ffedbdc-6908-4ffe-9838-4739465ab5cf.png" alt class="image--center mx-auto" /></p>
<p>Few key data, from Average numbers:</p>
<ul>
<li><p>Most of the time is taken by EventBridge rule, I couldn't do anything to lower this latency. The rule is as simple as possible and it is integrated natively by AWS.</p>
</li>
<li><p>The average total time taken is <strong>210.74 ms</strong>, versus <strong>108.39 ms</strong> taken by the first version with Lambda and SNS.</p>
</li>
<li><p>The average response time measured by my client, which covers my client's network latency, is 175 ms. Given Appsync AVG Latency is 62.7 ms, my Avg network latency is 112,13 ms. This means that from my client sending the mutation to consumers receiving the message there are 175 + 113.13 = <strong>288.13 ms</strong></p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This solution has proven to be fast and reliable and requires little configuration to set up and no glue-code to write.</p>
<p>Since everything is managed, there is no space for tuning and improvements.</p>
<p>The latency of this solution is worse than the first version by <strong>194.44%.</strong></p>
<p>However, EventBridge offers many more capabilities than SNS.</p>
<h2 id="heading-wrap-up">Wrap up</h2>
<p>In this article, I have presented you with a solution that I had to design as part of my work and my approach to solution development: this includes clarifying the scope and context, evaluating different options, and having a good knowledge of the parts involved and the performance and quality attributes of the overall system, writing code and benchmarking where necessary, but always with the clear awareness that there are no perfect solutions.</p>
<p>I hope it was helpful to you, and <a target="_blank" href="https://github.com/ncremaschini/fargate-notifications">here is the GitHub repo to deploy both versions of the solution</a>.</p>
<p>Bye 👋!</p>
]]></content:encoded></item><item><title><![CDATA[Evaluating Performance: A Benchmark Study of Serverless Solutions for Message Delivery to Containers on AWS Cloud]]></title><description><![CDATA[In this article i'll show you how to forward events to private containers using serverless services and fan-out pattern.
I'll explore possible solutions within AWS ecosystem, but all are applicable regardless the actual service / implementation.
Cont...]]></description><link>https://haveyoutriedrestarting.com/evaluating-performance-a-benchmark-study-of-serverless-solutions-for-message-delivery-to-containers-on-aws-cloud</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/evaluating-performance-a-benchmark-study-of-serverless-solutions-for-message-delivery-to-containers-on-aws-cloud</guid><category><![CDATA[AWS]]></category><category><![CDATA[sns]]></category><category><![CDATA[SQS]]></category><category><![CDATA[DynamoDB]]></category><category><![CDATA[performance]]></category><category><![CDATA[Benchmark]]></category><category><![CDATA[aws-fargate]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[serverless]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Sun, 03 Mar 2024 21:48:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/DX9X0g0Cg88/upload/26f142070bd8ddbbc75f69d594f011a4.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this article i'll show you how to forward events to private containers using serverless services and fan-out pattern.</p>
<p>I'll explore possible solutions within AWS ecosystem, but all are applicable regardless the actual service / implementation.</p>
<h2 id="heading-context">Context</h2>
<p>Suppose you have a cluster of containers and you need to notify them when a database record is inserted or changed, and these changes apply to the internal state of the application. A fairly common use case.</p>
<p>Let's say you have the following requirements:</p>
<ul>
<li><p>The tasks are in an autoscaling group, so their number may change over time.</p>
</li>
<li><p>A task is only healthy if it can be updated when the status changes. In other words, all tasks must have the same status. Containers that do not change their status must be marked as unhealthy and replaced.</p>
</li>
<li><p>When a new task is started, it must be in the last known status.</p>
</li>
<li><p>Status changes must be in near real- time. Status changes in the database must be passed on to the containers in less than 2 seconds.</p>
</li>
</ul>
<p>Given these requirements, let's explore a few options.</p>
<h2 id="heading-option-1-tasks-directly-querying-the-database">Option 1: tasks directly querying the database</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708895045088/f09bbb59-736b-4b19-af9d-ea7c163d40b4.jpeg" alt="Task querying directly the database" class="image--center mx-auto" /></p>
<h3 id="heading-pros">Pros:</h3>
<ul>
<li><p>easy to implement: The task is just to perform a simple query and get the current status, assuming it can be queried.</p>
</li>
<li><p>fast: It really depends on the DB resources and the complexity of the query, but there are not many hops and can be configured to be fast. You can configure polling time to match our requirement of 2 seconds requirement, e.g. every 1 second.</p>
</li>
<li><p>easy to mark as unhealthy tasks that fails to perform queries. The application could catch errors in queries and mark itself as unhealthy if it has enough resources. Otherwise, the load balancer's health check would fail.</p>
</li>
</ul>
<h3 id="heading-cons">Cons:</h3>
<ul>
<li><p>waste of resources: Your application queries the database even if no changes have been made. If your database does not change more frequently than the polling rate, most queries are useless.</p>
</li>
<li><p>your database is a single point of failure: If the database cannot serve queries, tasks cannot be notified.</p>
</li>
<li><p>it does not scale well: As the number of tasks grows, the number of queries grows and you may need to scale the database as well, or you may need a very large cluster running all the time to accommodate any scaling, wasting resources.</p>
</li>
<li><p>difficult to monitor: How can you check if an individual task is in the right state?</p>
</li>
</ul>
<p>In such a scenario, I definitely don't like polling.</p>
<p>Let's try a different and opposite approach.</p>
<h2 id="heading-option-2-db-streams-changes-to-containers">Option 2: Db streams changes to containers</h2>
<p>Instead of having tasks asking to the database, let's have the database notifying them for changes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708896310371/615cc978-49a9-4efb-9124-0c30d467a5f4.jpeg" alt="db pushes events to tasks" class="image--center mx-auto" /></p>
<p>Before go into the pros and cons, i must say that it would be very hard if not impossible to implement this solution exactly as i drown it. We can use a very popular pattern, called <em>fan-out.</em></p>
<p>This is the <a target="_blank" href="https://en.wikipedia.org/wiki/Fan-out_(software)">wikipedia</a> definition:</p>
<blockquote>
<p>In message-oriented middleware solutions, fan-out is a messaging pattern used to model an information exchange that implies the delivery (or spreading) of a message to one or multiple destinations possibly in parallel, and not halting the process that executes the messaging to wait for any response to that message</p>
</blockquote>
<p>To make things a little more concrete, let's use some popular AWS services that are commonly used to implement this pattern:</p>
<ul>
<li><p>DynamoDB: NoSql database with native event streaming</p>
</li>
<li><p>SNS: pub/sub event bus</p>
</li>
<li><p>SQS: queue service</p>
</li>
</ul>
<p>The solution looks like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708897206829/cacbad49-053a-40d4-8210-531f54622538.jpeg" alt="event streaming and fan-out in action" class="image--center mx-auto" /></p>
<p>Now let's explore pros and cons:</p>
<h3 id="heading-pros-1">Pros:</h3>
<ul>
<li><p>first of all, you can see that arrows turned into dotted lines. This architecture is completely asynchronous</p>
</li>
<li><p>easy to implement: all integrations you need are native. You need just to configure serverless services and to implement a SQS consumer in your application.</p>
</li>
<li><p>very scalable: you can add as many task as you want without affecting the database, your limit here is SNS but is very high. As stated in <a target="_blank" href="https://docs.aws.amazon.com/general/latest/gr/sns.html">official docs</a> a single topic supports up to 12,500,000 subscriptions.</p>
</li>
<li><p>no waste of resources: a.k.a really cost-effective. This solution leverages on pay-per-use services, and they would be used only when actual changes occurs on db.</p>
</li>
<li><p>very easy to monitor: both SNS and SQS supports Dead Letter Topic / Queue: if a message isn't consumed within the timeout, it can be moved into a DLQ. You can set up an alarm if a DLQ is not empty, and kill the associated task.</p>
</li>
<li><p>easy to recover: If a container cannot consume a message, it can try again. In other words, it does not have to be online and ready to receive the message at the moment it is delivered, as the queues are persistent.</p>
</li>
<li><p>very fast: i did a benchmark on this solution, <a target="_blank" href="https://github.com/ncremaschini/fargate-notifications">here the github repo with the actual code</a>. Later on in this article we'll see results</p>
</li>
</ul>
<h3 id="heading-cons-1">Cons</h3>
<ul>
<li><p>more moving parts: even if the integration code is not required since it's provided by AWS, connecting things and tuning connections is not straightforward as performing a query.</p>
</li>
<li><p>not so easy to troubleshoot. As every distributed system, i would say.</p>
</li>
<li><p>it strongly depends on serverless services: if one link in the chain slows down or are not available, your containers can't be notified. We have to say that all involved services have a very good SLA: <a target="_blank" href="https://aws.amazon.com/it/messaging/sla/">3 nines for SQS and SNS</a> and <a target="_blank" href="https://aws.amazon.com/it/dynamodb/sla/">4 nines for DynamoDB</a>. Not sure about Dynamo stream, since it appears to be not included in DynamoDB SLA. I suppose dynamo streams are backed by Kinesis Streams, <a target="_blank" href="https://aws.amazon.com/it/kinesis/sla/">which also has 3 nines of availabilit</a>y.</p>
</li>
</ul>
<h3 id="heading-open-points">Open points:</h3>
<p>The main open point here, to me, was: is this fast enough? Let's verify it.</p>
<h2 id="heading-trust-but-verify">Trust, but verify</h2>
<p>I couldn't find any official SLA about latency for involved services nor any AWS official benchmark.</p>
<p>So i decided to perform one myself, and i scripted a basic application using typescript and CDK / SDK.</p>
<p><a target="_blank" href="https://github.com/ncremaschini/fargate-notifications">Here the github repo with the actual code</a> and details on how the system is implemented.</p>
<p>Before going ahead, bare in mind that i performed this benchmark with the goal to understand if this combination of services / configuration could fit for my specific context / use case. Your context may be different, and this configuration may not fit with it.</p>
<h3 id="heading-system-design-and-data-flow">System design and data flow</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1709500848988/283edabf-0a85-43b3-bff1-22a8488a3aee.jpeg" alt class="image--center mx-auto" /></p>
<ol>
<li><p>The AppSync API receives mutations and stores derived data in the DynamoDB table</p>
</li>
<li><p>The DynamoDB stream the events</p>
</li>
<li><p>The Lambda function is triggered by the DynamoDB stream</p>
</li>
<li><p>The Lambda function sends the events to the SNS topic</p>
</li>
<li><p>The SNS topic sends the events to the SQS queues</p>
</li>
<li><p>The Fargate service reads the events from the SQS queues</p>
</li>
<li><p>If events are not processed within a timeout, they are moved to the DLQ</p>
</li>
<li><p>A Cloudwatch alarm is triggered if the DLQ is not empty</p>
</li>
</ol>
<h3 id="heading-key-system-parameters">Key system parameters:</h3>
<ul>
<li><p>Region: eu-south-1</p>
</li>
<li><p>Number of tasks: 20</p>
</li>
<li><p>Event bus: 1 SQS per task, 1 DLQ per SQS, all SQS subscribed to one SNS</p>
</li>
<li><p>SQS Consumer: provided by AWS SDK, configured for long polling (20s)</p>
</li>
<li><p>Task configuration: 256 CPU, 512 Memory, Docker image based on <a target="_blank" href="https://hub.docker.com/layers/library/node/20-slim/images/sha256-80c3e9753fed11eee3021b96497ba95fe15e5a1dfc16aaf5bc66025f369e00dd?context=explore">Official Node Image 20-slim</a></p>
</li>
<li><p>DynamoDB Configured in PayPerUseMode, stream enabled to trigger Lambda</p>
</li>
<li><p>Lambda stream handler written in node20 bundled with <a target="_blank" href="https://esbuild.github.io/">ESBuild</a>, configured with 128MB</p>
</li>
</ul>
<h3 id="heading-benchmark-parameters">Benchmark parameters</h3>
<p>I used a basic postman collection runner to perform a mutation to Appsync every 5 seconds, for 720 iterations.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708963551408/55810eb0-a306-4dec-9b96-2d2667fd8e19.png" alt="postman runner execution recap" class="image--center mx-auto" /></p>
<h3 id="heading-goal">Goal</h3>
<p>The goal was to verify if containers would be updated within 2 seconds.</p>
<h3 id="heading-measurements">Measurements</h3>
<p>I used the following Cloudwatch provided metrics:</p>
<ul>
<li><p>Appsync latency</p>
</li>
<li><p>Lambda latency</p>
</li>
<li><p>Dynamo stream latency</p>
</li>
</ul>
<p>and I created two custom metrics for measuring SQS and SNS time taken.</p>
<p>Time-taken custom metrics are calculated from the SNS and SQS-provided attributes:</p>
<ul>
<li>SNS Timestamp: <a target="_blank" href="https://docs.aws.amazon.com/sns/latest/dg/sns-message-and-json-formats.html">from AWS doc</a></li>
</ul>
<blockquote>
<p>The time (GMT) when the notification was published.</p>
</blockquote>
<ul>
<li>ApproximateFirstReceiveTimestamp: <a target="_blank" href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html">from AWS doc</a></li>
</ul>
<blockquote>
<p>returns the time the message was first received from the queue (epoch time in milliseconds).</p>
</blockquote>
<ul>
<li>SentTimestamp: <a target="_blank" href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html">from AWS doc</a></li>
</ul>
<blockquote>
<p>Returns the time the message was sent to the queue (epoch time in milliseconds).</p>
</blockquote>
<p>The following code snippet shows you how attributes are used to calculate <em>sns time taken in millis</em> and <em>sqs time taken in millis</em></p>
<pre><code class="lang-typescript">
<span class="hljs-comment">//despite the name, this is the ISO Date the message was sent to the SNS topic</span>
<span class="hljs-keyword">let</span> snsReceivedISODate = messageBody.Timestamp;
<span class="hljs-keyword">if</span> (snsReceivedISODate &amp;&amp; message.Attributes) {   
   clientReceivedTimestamp = +message.Attributes.ApproximateFirstReceiveTimestamp!;
   sqsReceivedTimestamp = +message.Attributes.SentTimestamp!;

   <span class="hljs-keyword">let</span> snsReceivedDate = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(snsReceivedISODate);
   snsReceivedTimestamp = snsReceivedDate.getTime();
   clientReceivedDate = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(clientReceivedTimestamp!);
   sqsReceivedDate = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(sqsReceivedTimestamp!);

   snsTimeTakenInMillis = sqsReceivedTimestamp - snsReceivedTimestamp;
   sqsTimeTakenInMillis = clientReceivedTimestamp - sqsReceivedTimestamp;
</code></pre>
<p>i didn't calculate the time taken by the client to parse the message because it really depends on the logic the client applies to parsing the message.</p>
<h3 id="heading-results">Results</h3>
<p><em>Disclaimer: some latency measurements are calculated on consumers' side, and we all know that synchronizing clocks in a distributed system is a hard problem.</em></p>
<p><em>Still, measurements are performed by the same computing nodes.</em></p>
<p><em>Please consider following latencies not as precise measurements but as coarse indicators.</em></p>
<p>Here screenshots from my Cloudwatch dashboard</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1709306099546/085f9eb4-f679-4165-9464-280cd4038f95.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708964741699/03618119-0de1-4d02-be44-fbebecb758f3.png" alt class="image--center mx-auto" /></p>
<p>Few key data, from Average numbers:</p>
<ul>
<li><p>Most of time is taken by Appsync, i couldn't do anything to lower this latency since i used native Appsync native integration with DynamoDB.</p>
</li>
<li><p>The only custom code is the Lambda stream processor code, and lamba duration is the second slowest component here. As you can see in the graph, the lambda cold start is the killer, but considering this we can observe a very good latency on avg (38 ms).</p>
</li>
<li><p>The average total time taken is <strong>108.39 ms</strong></p>
</li>
<li><p>The average response time measured by my client, that cover my client network latency, is 92 ms. Given Appsync AVG Latency is 60.5 ms, my Avg network latency is 29.5 ms. This means that from my client sending the mutation to consumers receiving the message there are 108.39 + 29.5 = <strong>137.89 ms</strong></p>
</li>
</ul>
<h3 id="heading-conclusion">Conclusion</h3>
<p>This solution has proven to be fast and reliable and requires little configuration to set up.</p>
<p>Since almost everything is managed, there is little space for tuning and improvements. In this particular configuration, I could simply give the Stream Processor Lambda more memory, but memory and latency do not scale (inversely) together.</p>
<p><s>I could remove Lambda and replace it with Event Bridge Pipe. I haven't tried it yet, but i'm going to use the exact same benchmark and compare the results.</s></p>
<p><strong>UPDATE:</strong> <a target="_blank" href="https://haveyoutriedrestarting.com/evaluating-performance-a-benchmark-study-of-serverless-solutions-for-message-delivery-to-containers-on-aws-cloud-episode-2">here the benchmark of the aforementioned solution with EventBridge</a></p>
<p>Last but not least, keep in mind that AWS does not always include latency in the service SLA. I've run this benchmark a few times with comparable results, but I can't be sure that I will always get the same results over time. If your system requires stable and predictable performance over time, you can't go with services that don't include performance metrics in their SLA. You're better off taking control of the layers below, which means <a target="_blank" href="https://engineering.dunelm.com/pizza-as-a-service-2-0-5085cd4c365e">you should consider going to a restaurant or even making your own pizza at home.</a></p>
<h2 id="heading-wrap-up">Wrap up</h2>
<p>In this article, I have presented you with a solution that I had to design as part of my work and my approach to solution development: this includes clarifying the scope and context, evaluating different options and having a good knowledge of the parts involved and the performance and quality attributes of the overall system, writing code and benchmarking where necessary, but always with the clear awareness that there are no perfect solutions.</p>
<p>I hope it was helpful to you, and <a target="_blank" href="https://github.com/ncremaschini/fargate-notifications">here is the GitHub repo to deploy both versions of the solution</a>.</p>
<p>Bye 👋!</p>
]]></content:encoded></item><item><title><![CDATA[Serverless social login with AWS Cognito]]></title><description><![CDATA[Disclaimer: This is not a step-by-step guide, just my trade-off analysis on using Amazon Cognito to provide social login for your app and some pitfalls I found in my experience.
In this article, I'll show you my serverless solution to add social iden...]]></description><link>https://haveyoutriedrestarting.com/serverless-social-login-with-aws-cognito</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/serverless-social-login-with-aws-cognito</guid><category><![CDATA[serverless]]></category><category><![CDATA[Cognito]]></category><category><![CDATA[social login]]></category><category><![CDATA[oauth]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Sun, 24 Dec 2023 16:30:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ZYLmudR28SA/upload/b8e67f93d0f65e547d047b89d9e46698.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Disclaimer: This is not a step-by-step guide, just my trade-off analysis on using Amazon Cognito to provide social login for your app and some pitfalls I found in my experience.</em></p>
<p>In this article, I'll show you my serverless solution to add social identity providers as a login option for web and mobile applications, based on managed services and native integrations, and how I mitigated some issues I encountered.</p>
<h2 id="heading-context">Context</h2>
<p>Let's assume you have an application for which your users do not have to register, but can log in with their social identity.</p>
<p>If you 're wondering why, think about it:</p>
<ul>
<li><p>registration could be a barrier to entry for users as it requires more steps and sharing of data</p>
</li>
<li><p>most internet users have at least one social identity. All mobile users have at least one (Google identity for Android users, Apple identity for Apple users)</p>
</li>
<li><p>it is very easy for users to access your app if most of the login is done without a password</p>
</li>
<li><p>you can receive users data from social providers, if users allow your app to give their data to your app.</p>
</li>
</ul>
<p>The most popular social IdPs are Facebook, Google, Apple, Amazon, LinkedIn, Github and many others.</p>
<p>Considering that every IdP should implement the OpenID Connect standard (we'll come back to this later...), which is a layer above the OAuth2 standard, and that every IdP requires some configuration, let's explore some options.</p>
<h2 id="heading-option-1-native-integration">Option 1: Native integration</h2>
<p>Every one has its own SDK and apis to integrate natively, so you can code in your app the integration for IdPs you want to use.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1702570245136/d462c300-30e2-4a59-bcc1-8773f7710c1e.png" alt="direct integration with IdP's SDK" class="image--center mx-auto" /></p>
<h3 id="heading-pros">Pros</h3>
<ul>
<li><p>fine-grained control over each individual IdP integration. Since each IdP is natively integrated, you can customise the specific UX via configuration and handle IdP requests that are not included in the OAuth standard (we'll get to that later...)</p>
</li>
<li><p>direct integration, no intermediary, straightforward architecture. You can rely on robust implementations (Google / Facebook / Amazon provides good code in their SDK) and IdP resilience and H-A.</p>
</li>
<li><p>Cost-effective: usually IdPs provide a free-tier for their api, so there aren't any costs from that side.</p>
</li>
</ul>
<h3 id="heading-cons">Cons</h3>
<ul>
<li><p>Difficult to scale: Each IdP has its own SDK and its own customs (someone said "standard?"), and it takes a lot of code is required to handle them. Even if you put your authentication logic into a library, you have to distribute it to all clients to get a change.</p>
</li>
<li><p>Hard to test / troubleshoot: more code, more tests. Moreover, different integrations require you to know IdP customs.</p>
</li>
</ul>
<h2 id="heading-option-2-use-an-oauth-provider">Option 2: use an OAuth Provider</h2>
<p>Since Social IdP adhere to a standard, it's easy to abstract the specific implementation (SDK) and work with interfaces, integrating with an OAuth 2 service provider.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1702912629290/ecb258f3-eb97-4010-94c3-4b4ba91d3363.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-pros-1">Pros</h3>
<ul>
<li><p>Just one integration, between your client and di OAuth identity platform. Less code, less test, less releases, more speed.</p>
</li>
<li><p>Easy to scale: you can add/remove IdP without impacting clients (see previous bullet)</p>
</li>
<li><p>Authentication flow configuration and governance is now centralised. You can create consistent auth flow regardless the specific IdP you support, and you can monitor it and gather metrics and statistics in one place.</p>
</li>
<li><p>You build your auth flow on standards.</p>
</li>
<li><p>There are Identity Platform as a service out there (AWS Cognito, Auth0, Google Firebase and many others)</p>
</li>
</ul>
<h3 id="heading-cons-1">Cons</h3>
<ul>
<li><p>Your integration choices are limited to IdPs supported by your OAuth provider.</p>
</li>
<li><p>Your system complexity is higher, since you add components to it.</p>
</li>
<li><p>The OAuth provider could be a single point of failure. If it is not available, you cannot offer authentication to your customers. Therefore, you need to think carefully about the reliability and scaling of your OAuth provider.</p>
</li>
</ul>
<h2 id="heading-my-choice-option-2-with-aws-cognito">My choice: Option 2 with AWS Cognito</h2>
<p>I'm aware that you may have many constraints and to be brief, I cannot list them all: given my context, I went with option 2 and used AWS Cognito as OAuth provider. I did a spike on Auth0 and some other services.</p>
<blockquote>
<p>I decided to accept the constraints and costs of Cognito in exchange for a low-code implementation and easy setup, in other words for faster delivery, because I wasn't sure if it would be worth it.</p>
</blockquote>
<p>Here my actual implementation</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1702913119260/edd0ce2e-c73c-40fa-ba08-bc40003cb45d.png" alt class="image--center mx-auto" /></p>
<p>All you need is to:</p>
<ul>
<li><p>configure your integration on Social Provider side. Here a reference for each of provider i integrated with</p>
<ul>
<li><p><a target="_blank" href="https://ryandam9.medium.com/using-google-as-an-identity-provider-in-aws-cognito-acddfb58fad">Google</a></p>
</li>
<li><p><a target="_blank" href="https://victorhzhao.medium.com/add-social-login-to-aws-cognito-user-pool-facebook-94a2cee5136e">Facebook</a></p>
</li>
<li><p><a target="_blank" href="https://jainsameer.medium.com/react-native-social-sign-in-with-apple-and-amplify-6c803b2971d6">Apple</a></p>
</li>
</ul>
</li>
<li><p>configure Cognito integration. <a target="_blank" href="https://docs.aws.amazon.com/cognito/latest/developerguide/external-identity-providers.html">Here AWS Doc for each supported providers</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/cognito/dev-resources/?nc1=h_ls">Integrate your application with Amazon Cognito</a>. Cognito provides an <a target="_blank" href="https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-pools-app-integration.html">hosted ui</a> for the login page, but you can create your own.</p>
</li>
</ul>
<h2 id="heading-pitfalls-things-to-be-careful-about">Pitfalls: things to be careful about</h2>
<p>Here I list some of the pitfalls I have encountered in this integration. This is not an exhaustive list of what goes wrong with Amazon Cognito and the social login flow, but again my personal experience or in other words things I found during my working days.</p>
<h3 id="heading-watch-out-for-cognito-limits">Watch out for Cognito limits</h3>
<p>Serverless does not mean infinite, and Cognito is one of the services that best demonstrates this.</p>
<p>In one sentence: Cognito's scaling policy is not designed for spiky patterns.</p>
<p>The scaling pattern is (reasonably) tied to the size of your user pool: the more users, the more TPS provided.</p>
<p>But, and here comes the first pitfall, the first threshold is up to 1 million users. From 1 to 999999 users, you have the same TPS.</p>
<p>This means that if your login pattern is fairly consistent, you probably won't have any problems. However, if your login pattern is spiky, perhaps because your app is tied to certain time periods in some way, your app will struggle with a lot of throttling errors from Cognito.</p>
<p>These diagram show successful federation logins and throttling errors:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703411639152/b6027e66-8b18-402b-981c-ecce4697d4de.png" alt="Cognito success login" class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703411372971/361a812c-244f-4402-b2bb-be9822d2c3d2.png" alt="Cognito Throttling errors" class="image--center mx-auto" /></p>
<p>i split into two distinct diagrams for better visualisation, but i want to point out that</p>
<ul>
<li><p>around 20:50 i had ~7K throttling errors and ~1.5K of success (total requests: ~8.5K)</p>
</li>
<li><p>around 21:20 i had ~6K throttling errors and ~1.4K success (total requests: ~7.5K)</p>
</li>
<li><p>around 22:30 i had ~1.3K success with ZERO throttling errors</p>
</li>
</ul>
<p>Cognito TPS calculation rules can be found <a target="_blank" href="https://docs.aws.amazon.com/cognito/latest/developerguide/limits.html">at this specific section of Cognito docs</a>, and you have to carefully consider them.</p>
<p>As you can see from the successfully login metric diagram, handling the throttling exception in your app can mitigate the user impact: they would be able to successfully login anyway, but waiting a little bit more.</p>
<blockquote>
<p><strong><em>I decided that it could be acceptable, and i traded it for easy setup and integration with Social Providers.</em></strong></p>
</blockquote>
<p>Since this decision would impact our customer experience i tried to mitigate it as much as possible, for instance sending push notifications before traffic spikes to encourage users to log in and spread log in requests.</p>
<h3 id="heading-standards-are-not-prescriptive">Standards are not prescriptive</h3>
<p>I love standards, everybody should love them in engineering.</p>
<p>Unfortunately, sometimes for good reasons and sometimes not, giants have bias to force standards a little bit.</p>
<p>Apple, i'm pointing my finger at you!</p>
<p>First, Apple's guidelines require you to log in to Apple if you want to distribute your app in the Apple Store and your app has a social login feature. That may be a bit rude, but it's fair.</p>
<p>Again,Apple prescribe you that the "User cancellation" function must be accessible and clear. That is also fair.</p>
<p>And here Apple does not adhere to the OAuth standard: if an Apple user allows Apple to share their data with your app, some kind of association between your app and the user also takes place in the Apple system, and if a user wants to cancel from your app (also known as your user pool), this association should also be removed.</p>
<p>To do that, you have to invoke Apple apis to:</p>
<ul>
<li><p>generate a valid access or refresh token.</p>
</li>
<li><p>invalidate the freshly generated token.</p>
</li>
</ul>
<p><a target="_blank" href="https://developer.apple.com/documentation/sign_in_with_apple/revoke_tokens">Sounds weird, but this is exactly what this doc page prescribes.</a></p>
<p>And, guess what? Cognito doesn't handle it.</p>
<p>Even if Cognito could handle it because it has all the information it needs, especially the private key you created on the Apple side and provided to Cognito to request the tokens, that's reasonable from a product perspective: Cognito adheres to standards and can't track every specific implementation.</p>
<p>But it does mean that Apple won't include your app in the store if you don't take care of it.</p>
<p>So let's take a look at how to implement it.</p>
<p>You can't implement it in the app: i used Cognito to decouple the app from auth providers, and i don't want to violate that requirement. Besides, you don't want to store your private key on the device, do you?</p>
<p>So you need to implement it on the backend side. My first idea was to use events to respond to the Cognito event to delete a user and trigger a lambda that calls the Apple api to delete the user on the Apple side.</p>
<p>As far as I know, Cognito today has</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-identity-pools-working-with-aws-lambda-triggers.html#cognito-user-pools-lambda-trigger-event-parameter-shared">Lambda triggers</a>: user deletion not supported</p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/cognito/latest/developerguide/amazon-cognito-info-in-cloudtrail.html">Cloudtrail tracks all management api calls</a>, and user cancellation is a management api. But Cloudtrail event doesn't have any reference to actual user (and it saved my day in an audit session, but this is another story)</p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-events.html">Cognito Sync</a>: it seems to handle user deletion. Quoting:</p>
<blockquote>
<p>To remove a record, either set the <code>op</code> to <code>remove</code>, or set the value to null.</p>
</blockquote>
</li>
</ul>
<p>This is how it looks like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703423682433/7a7f0717-55d9-43b9-966a-0c60e6d9c39c.png" alt="Apple user cancellation w/ Cognito Sync" class="image--center mx-auto" /></p>
<p>I see two problems here:</p>
<ul>
<li><p>first, you have to put your Apple's private key in Cognito and in Secret Manager. Cognito can't retrieve it from Secret Manager. I raised this issue to Cognito team, keep you posted on this.</p>
</li>
<li><p>second, Cognito user cancellation and Apple user cancellation are asynchronous: what if it success on Cognito side and than fails on Apple side? User wont be in our Cognito user pool anymore, so we can't rollback the operation. So you need to handle failures, and to handle it you need to store it. Let's add a DLQ for our deletion lambda</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703424275924/79d20cd1-a18d-4ff3-829a-16905768712f.png" alt class="image--center mx-auto" /></p>
<p>After saving, you must analyse why the deletion failed and try again. How long can this take? It depends on the cause and your process, but until you've done that, users will still see their user associated with your app, and I'm not sure Apple would like it and approve your app submission.</p>
<p>You need to reverse the order of deletion, first on the Apple side and then on the Cognito side. If the Apple deletion fails, you can send an error message to the user and inform he/she that the deletion cannot be performed and they should try again later.</p>
<p>In the case of a Cognito error, you will have to do this later, but at least the user will not see that their user is linked to your app and Apple should be satisfied and approve your request.</p>
<p>Let's see how it looks like</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703425150169/dad05420-73e0-42b8-9875-9356799194bc.png" alt="User deletion with custom api" class="image--center mx-auto" /></p>
<p>I still see two problems here:</p>
<ul>
<li><p>Again , you have to put your Apple's private key in Cognito and in Secret Manager.</p>
</li>
<li><p>Your app now is integrated with two systems: Cognito for Sign-in operation and your custom api for user deletion</p>
</li>
</ul>
<p>Both solutions somehow solve the problem and both raise new concerns, so I had to opt for the less bad one.</p>
<blockquote>
<p>I decided to implement a custom api for Apple user deletion because it can be implemented just in half our code base (not for Android version of the app), the integration is quite simple and Apple would be happy with this solution, but probably not with the alternative solution. Still an error handling mechanism still need to be implemented to catch Cognito deletion errors and to recover them.</p>
</blockquote>
<h2 id="heading-wrap-up">Wrap up</h2>
<p>I have shown you my solution to real-world problems and how you can make informed decisions by carefully weighing trade-offs between different solutions that best fit your context and constraints.</p>
<p>In other words, the daily work of an architect, simplified.</p>
<p>Architectures need to evolve as the context and constraints change over time. So always design your solutions so that they can easily evolve with them.</p>
<p>I hope it was useful for you!</p>
<p>Bye 👋!</p>
]]></content:encoded></item><item><title><![CDATA[How to handle multiple git based systems on the same Mac(hine)]]></title><description><![CDATA[Hello everyone 👋 ! This is my first article, and my first tech blog actually.
In this article, I'll show you how I configured my Mac to work on repos hosted on my personal Github, on my company's Github, on Gitlab, and on AWS CodeCommit with AWS SSO...]]></description><link>https://haveyoutriedrestarting.com/how-to-handle-multiple-git-based-systems-on-the-same-machine</link><guid isPermaLink="true">https://haveyoutriedrestarting.com/how-to-handle-multiple-git-based-systems-on-the-same-machine</guid><category><![CDATA[GitHub]]></category><category><![CDATA[CodeCommit]]></category><category><![CDATA[GitLab]]></category><category><![CDATA[AWS SSO]]></category><category><![CDATA[mac]]></category><category><![CDATA[version control]]></category><dc:creator><![CDATA[Nicola Cremaschini]]></dc:creator><pubDate>Fri, 08 Dec 2023 12:45:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/2JIvboGLeho/upload/ea7f98a49781c1d89bc0797ac3d49c69.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello everyone 👋 ! This is my first article, and my first tech blog actually.</p>
<p>In this article, I'll show you how I configured my Mac to work on repos hosted on my personal Github, on my company's Github, on Gitlab, and on AWS CodeCommit with AWS SSO integration.</p>
<p>Not rocket science 🚀, but something I've struggled with a bit and can be achieved in a number of ways.</p>
<p>Let's see how I did it.</p>
<h2 id="heading-why-do-i-need-many-version-control-systems">Why do I need many version control systems?</h2>
<p>Here's my context: I work for a company as a cloud architect and need to access my company's repositories hosted by Github Enterprise with a corporate user.</p>
<p>My company also has an AWS organisation, and we use AWS SSO federated with Corporate ADFS to access the organisation's accounts, and we have some repositories hosted on CodeCommit.</p>
<p>We also have some repositories on an old Gitlab installation that is somewhere in the basement of our office.</p>
<p>And finally, I have my personal repositories hosted on Github under my good old username.</p>
<p>I assume I'm not alone in the world with this:</p>
<ul>
<li><p>Org's Enterprise Github, accessed with XYZ user</p>
</li>
<li><p>Org's Gitlab, accessed with TYU user</p>
</li>
<li><p>Org's AWS CodeCommit, accessed with QWE (federated) user</p>
</li>
<li><p>Your personal Github, accessed with ZXC user</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1701966013128/1ed1a988-093d-4a65-b673-1b1af4913864.png" alt class="image--center mx-auto" /></p>
<p><img src alt class="image--center mx-auto" /></p>
<p>I'm used to working with only one Mac, both for professional and personal projects. Therefore I need to pull and push code from/to different version control systems, in my case all Git-based, with different users and of course without being asked for credentials with every command.</p>
<p>I already had a lot of my Company's repositories downloaded to my Mac when I added all the other repositories, and I didn't want to reconfigure all my local Git repositories.</p>
<p>Does this sound familiar? If so, go ahead...</p>
<h2 id="heading-osx-key-chain-dear-friend">OSX Key Chain, dear friend...</h2>
<p>Okay, now what?</p>
<p>So the OSX keychain can store your credentials and you can configure your Git client to retrieve them, right? Wrong!</p>
<p>Of course you can, but the OSX keychain stores credentials by hostname, which means it can store your Github company credentials OR your personal credentials, because for both the host is <a target="_blank" href="http://github.com">github.com</a>.</p>
<p>With Key Chain you can store ONE credential per host, or at least I haven't found a way to store more than one.</p>
<p>So if you configure Git to use OSX Key Chain as a credential helper and you store the credentials of your personal Github user, everything will work fine when interacting with your personal repos.</p>
<p>But if you try to interact with your organisation's repositories, you'll get a 403.</p>
<h2 id="heading-https-vs-ssh-vs-grc">HTTPS vs SSH vs GRC</h2>
<p>The three Version Control Systems supports different protocols support</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>VCS</td><td>HTTPS</td><td>SSH</td><td>GRC</td></tr>
</thead>
<tbody>
<tr>
<td>Github</td><td>✅</td><td>✅</td><td>❌</td></tr>
<tr>
<td>AWS CodeCommit</td><td>✅</td><td>✅</td><td>✅</td></tr>
<tr>
<td>GitLab</td><td>✅</td><td>✅</td><td>❌</td></tr>
</tbody>
</table>
</div><p>Since I had already configured many Github Enterprise repositories to use HTTPS, I considered my Enterprise Github to be the default.</p>
<h3 id="heading-enteprise-github-via-https">Enteprise Github via HTTPS</h3>
<p>I set up git to use OSX Key Chain as credential helper in my git <strong><em>global</em></strong> config, as follow:</p>
<pre><code class="lang-bash">[credential <span class="hljs-string">"https://github.com"</span>]
    helper = osxkeychain
</code></pre>
<p>and i use https connection when i clone repositories from there.</p>
<p>To edit your git global configuration, use the following command:</p>
<pre><code class="lang-bash">git config --global --edit
</code></pre>
<p>This snippet shows local git configuration for HTTPS connection:</p>
<pre><code class="lang-bash">[remote <span class="hljs-string">"origin"</span>]
        url = https://github.com/your-org/your-repo.git
        fetch = +refs/heads/*:refs/remotes/origin/*
</code></pre>
<p>For my enterprise Github, that's enough: It uses the credentials stored in my Credentials Helper.</p>
<p>This way, every time I clone a repo over HTTPS and don't specify a local Git configuration, the global configuration and the OSX keychain are used for the credentials.</p>
<h3 id="heading-personal-github-via-ssh">Personal Github via SSH</h3>
<p>Then, for my personal Github, i set up SSH connection.</p>
<p>You can do the same following <a target="_blank" href="https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account">this guide</a>.</p>
<p>You need to tell git to use that key, and here is my <strong><em>local</em></strong> git config</p>
<pre><code class="lang-bash">[remote <span class="hljs-string">"origin"</span>]
        url = git@your-shh-key-alias:your-user/your-repo.git
        fetch = +refs/heads/*:refs/remotes/origin/*
</code></pre>
<p>to edit your local git config, run the following command inside your repository's root folder</p>
<pre><code class="lang-bash">git config --edit
</code></pre>
<p>Look the url part of this configuration:</p>
<blockquote>
<p>url = <a target="_blank" href="mailto:git@github.com">git@</a><em>your-ssh-key-alias</em>:<em>your-user</em>/<em>your-repo</em>.git</p>
</blockquote>
<p>You can see that I used an alias for my ssh key.</p>
<p>To do that, you need to edit your <em>.ssh/config</em> file.</p>
<p>Here is my <em>.ssh/config</em></p>
<pre><code class="lang-bash">Host github.com-personal
   HostName github.com
   User git
   IdentityFile ~/.ssh/github/id_rsa_personal
   TCPKeepAlive yes
   IdentitiesOnly yes
</code></pre>
<p>The <em>Host</em> parameter is your ssh key alias, so my url looks like</p>
<blockquote>
<p>url = <a target="_blank" href="mailto:git@github.com">git@</a><strong>github.com-personal:*</strong>your-user<em>/</em>your-repo*.git</p>
</blockquote>
<p>The <em>IdentityFile</em> is the path to your key file on disk, so it depends on where you have saved it.</p>
<p>Pay attention to the <em>user</em> parameter: this is not your Git user, but the user for the ssh connection.</p>
<p>This way, every time I clone one of my personal repositories, I have to edit my local Git configuration.</p>
<p>In my case, I have a lot more new repositories from my company than personal repositories and therefore I decided to keep HTTPS as default for my company repositories: It requires less configuration.</p>
<p>You know, programmers are lazy 🦥...</p>
<h2 id="heading-gitlab-via-ssh">Gitlab via SSH</h2>
<p>I used ssh for connecting to Gitlab too, <a target="_blank" href="https://docs.gitlab.com/ee/user/ssh.html">here Gitlab doc to ssh connection</a>.</p>
<p>The local configuration is exactly the same as for Github. So you need another alias for your ssh key and have to set up your local Git repository to use your alias.</p>
<h2 id="heading-codecommit-via-grc">CodeCommit via GRC</h2>
<p>As i said, our AWS Accounts are part of our AWS Organisation and we access them with AWS SSO federated with our ADFS.</p>
<p>Googling around i found <a target="_blank" href="https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-git-remote-codecommit.html">this documentation page from AWS</a> that starts with:</p>
<blockquote>
<p>If you want to connect to CodeCommit using a root account, federated access, or temporary credentials, you should set up access using <strong>git-remote-codecommit</strong>.</p>
</blockquote>
<p>Bingo! 🎯</p>
<p>Here how it works:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1702034337823/e4e95b87-8956-4146-a197-33116ff5a7c4.png" alt class="image--center mx-auto" /></p>
<p><img src alt class="image--center mx-auto" /></p>
<p>First you have to install <strong><em>git-remote-codecommit</em> python package.</strong></p>
<p><a target="_blank" href="https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-git-remote-codecommit.html">This documentation page tells you how</a>.</p>
<p>Then you need to create a local AWS profile tied to the account/region where your repositories are stored and configure AWS SSO.</p>
<p>Just follow <a target="_blank" href="https://docs.aws.amazon.com/cli/latest/userguide/sso-configure-profile-token.html">this guide from AWS</a> to do so.</p>
<p>Finally, you have to clone your repository using that package.</p>
<p>Here's mine local git repository configuration:</p>
<pre><code class="lang-bash">[remote <span class="hljs-string">"origin"</span>]
        url = codecommit://your-aws-profile@your-repo
        fetch = +refs/heads/*:refs/remotes/origin/*
</code></pre>
<p>let's take a look to the <em>url</em> parameter:</p>
<blockquote>
<p>url = codecommit://your-aws-profile@your-repo</p>
</blockquote>
<p><em>codecommit</em> is the protocol, it tells git to use our python package.</p>
<p><em>your-aws-profile</em> refers to your AWS Profile name.</p>
<p>This way, Git commands are executed on this repository with the Python package and with my AWS profile.</p>
<p>If you have a repository in different accounts, you need to set up different profiles and use them accordingly.</p>
<h2 id="heading-wrap-up">Wrap up</h2>
<p>Here is how it looks like at the end of the story:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1702037941106/7ddad4a6-a484-49ec-b910-34d396326741.png" alt class="image--center mx-auto" /></p>
<p><img src alt class="image--center mx-auto" /></p>
<p>With the above configuration, I can easily switch between local folders and use Git without being asked for any credentials and without 403. 🎉</p>
<p>I tried to keep the configuration overhead low for the most commonly used version control system (Enterprise Github) and I used GRC for CodeCommit because I don't want to specify a profile when I run my Git command.</p>
<p>i have aliases and want to use them the same way regardless of the account, and this configuration hides the profile specification from the commands.</p>
<p>But if I didn't already have many cloned repositories with HTTPS, I would use SSH for enterprise Github as well and remove the OSX key chain from my configuration.</p>
<p>I hope this was helpful, thanks for reading!</p>
<p>👋👋</p>
]]></content:encoded></item></channel></rss>