Verify existence of arbitrary email addresses from the command line

#!/usr/bin/env ruby
# frozen_string_literal: true

require 'resolv'
require 'net/smtp'

def mx_records(domain)
  Resolv::DNS.open do |dns|
    dns.getresources(domain, Resolv::DNS::Resource::IN::MX)
  end
end

def mailbox_exist?(email)
  domain = email.split('@').last
  mx = mx_records(domain).first
  return false unless mx

  Net::SMTP.start(mx.exchange.to_s, 25) do |smtp|
    smtp.mailfrom 'info@example.com' # replace with your email address or something more realistic
    smtp.rcptto email
  end
  true
rescue Net::SMTPFatalError, Net::SMTPSyntaxError
  false
end

if ARGV.length != 1
  puts "Usage: ruby #{__FILE__} <email_address>"
  exit 1
end

email = ARGV[0]
if mailbox_exist?(email)
  puts "Mailbox exists."
else
  puts "Mailbox doesn't exist or couldn't be verified."
end

Find files with largest amount of lines in your project

Ever needed to find the file with the largest amount of lines in your project? Use the snippet below to list all files and their line count, neatly sorted from smallest to largest.

find . -type f -print0 | xargs -0 wc -l | sort -n

This translates to the following:

find . -type f -print0 # Find all regular files in this dir and pipe them into xargs with \0 as separators.
xargs -0 wc -l # For each file contents, count the amount of lines in it and...
sort -n # Sort them numerically.

Ruby partition on arrays

In Ruby, the partition is a very useful method that you can use to filter some items in an array and that you need the ones that satisfy the condition and the ones that do not, and it takes a block of code and returns two arrays: the first contains the elements for which the block of code returns true, and the second contains the elements for which the block returns false.

Let's see an actual example:

def create_fake_emails_array
  emails = []
  10.times do |i|
    emails << { email: "user#{i + 1}@mailinator.com" }
  end

  10.times do |i|
    emails << { email: "user#{i + 1}@something.com" }
  end
  emails
end

my_emails = create_fake_emails_array

class EmailContactsWhitelistCleaner
  attr_reader :email_recipients

  def initialize(email_recipients)
    @email_recipients = email_recipients
  end

  def get_white_list_collection
    valid_recipients, invalid_recipients = partition_emails

    log_black_list_email_recipients(invalid_recipients)
    valid_recipients
  end

  private

  def partition_emails
    email_recipients.partition { |recipient| valid_recipient?(recipient[:email]) }
  end

  def valid_recipient?(email)
    !email.match?('mailinator') || mailinator_white_list.include?(email)
  end

  def log_black_list_email_recipients(invalid_recipients)
    return if invalid_recipients.empty?

    email_list = invalid_recipients.map { |recipient| recipient[:email] }.join(',')
    puts "The following emails are not in the whitelist: #{email_list}"
  end

  def mailinator_white_list
    # ENV.fetch('MAILINATOR_WHITE_LIST', '').split(',')
    'user1@mailinator.com,user2@mailinator.com,user3@mailinator.com'
  end
end

service = EmailContactsWhitelistCleaner.new(my_emails)
puts service.get_white_list_collection

Use OpenTelemetry gems to track your app's performance

Instead of going with expensive services like New Relic or Datadog, trace your Rails app's performance using the OpenTelemetry gems.

First, add the gems to your Gemfile:

gem 'opentelemetry-sdk'
gem 'opentelemetry-exporter-otlp'
gem 'opentelemetry-instrumentation-all'

Then, add this inside config/initializers/opentelemetry.rb

require 'opentelemetry/sdk'
require 'opentelemetry/exporter/otlp'
require 'opentelemetry/instrumentation/all'

OpenTelemetry::SDK.configure do |c|
  c.service_name = '<YOUR_SERVICE_NAME>'
  c.use_all() # enables all instrumentation!
end

Finally, launch your application and point it to a collector, like the OpenTelemetry Collector, Grafana Agent or SigNoz. Most of them have cloud or self-hosted versions.

Enjoy your observability!

Make Rails properly decode hashes and arrays in JSONB fields the way god intended

In what I'm going to call the greatest piece of pedantic fuckery of all time in Rails history, sgrif made JSON fields take primitives (including strings!!!!), instead of properly converting strings into Arrays and Hashes the way that God intended. In the years since, this one single peabrained decision, inexplicably rubber stamped by the rest of rails core, has surely cost millenias worth of headscratching, incontrollable sobbing, teeth gnashing, and rending of garments amongst poor Rails engineers like myself who wonder why, on an utterly non-deterministic basis, do my hashes turn into strings when going through the Postgres washing machine.

Unsure if you're having this problem yourself? Are you getting random no implicit conversion of Symbol into Integer (TypeError) errors in your code? That's what I'm talking about.

To fix this abomination and cast out the sgrif demon forever (or at least until they refactor ActiveRecord::Type modules again), simply toss the following file into your initializers and breathe easier.

# config/initializers/fix_active_record_jsonb.rb

ActiveRecord::Type::Json.class_eval do
  # this is a json field, thus always decode it
  def deserialize(value)
    ActiveSupport::JSON.decode(value) rescue nil
  end

  def serialize(value)
    if value.is_a?(::Array) || value.is_a?(::Hash)
      ::ActiveSupport::JSON.encode(value)
    elsif value.is_a?(::String) && value.start_with?("{", "[") && value.end_with?("}", "]")
      value
    elsif value.respond_to?(:to_json)
      value.to_json
    else
      value
    end
  end
end

Footnote: Apparently, I need to waste precious time of my life revisiting this topic every 5 years or so.

How to make Wisper properly load all subscribers in the app/subscribers directory structure

Rails.application.reloader.to_prepare do
  # Dev env will re-install subscribers on app reload
  Wisper.clear if Rails.env.development?

  Dir[Rails.root.join("app", "subscribers", "**", "*.rb")].each do |listener|
    relative_path = listener.gsub("#{Rails.root}/app/subscribers/", "")
    klass = relative_path.chomp(".rb").camelize.safe_constantize
    Wisper.subscribe(klass.new, async: !Rails.env.test?)
  end
end

Sharing since the sample provided by the project itself won't work with namespaced subscribers.

Fix Psych loading errors on wisper-sidekiq

One of my favorite rubygems is Wisper, a simple library that lets you add pub/sub style broadcasting and listeners to your app. (Been a fan since it came out in 2014, almost ten years ago!)

I tried to use wisper again recently, specifically with the wisper-sidekiq companion gem, which allows your subscribers to execute asynchronously as Sidekiq jobs.

Unfortunately, I immediately ran into an issue with Psych complaining about my parameters (ActiveRecord model instances) not being allowed by the YAML loading subsystem. If you're running into this same kind of issue, you'll know because you get exceptions that look like Tried to load unspecified class: Account (Psych::DisallowedClass)

Sidestepping the question of whether you should stick to only sending primitive object parameters (strings, integers, etc) as arguments to Sidekiq jobs, here is the monkeypatch solution to solving the problem, tested with wisper-sidekiq version 1.3.0.

Chuck this override into an initializer file.

module Wisper
  class SidekiqBroadcaster
    class Worker
      include ::Sidekiq::Worker

      def perform(yml)
        (subscriber, event, args) = ::YAML.unsafe_load(yml)
        subscriber.public_send(event, *args)
      end
    end
  end
end

The fix is the replacement of YAML.load with YAML.unsafe_load on line 16.

But Obie, isn't this dangerous? No. Objects passed in broadcast events are almost certainly not coming from the outside world in any way, shape, or form, so the reasons that you would typically be interested in blocking YAML loading from processing arbitrary objects do not apply.

But Obie, if your queues back up, won't you have stale data sitting in Redis as parameters to your jobs? For my particular use case this is not a concern, but for yours it might be. Either way it's not the job of the sidekiq-wisper library to enforce this constraint... and indeed, since the library predates the safety additions to YAML.load I'm not even sure that the author intended for the constraint to exist.

Now what would really be cool, and I wish I had time to implement this myself, is if sidekiq-wisper would automatically turn activerecord params into globalid identifiers and query them for you on the consumer side, the way that sidekiq does. Somebody should definitely implement something like that!

Fix inspect on Devise models

Have you wondered why User and other Devise models don't print properly in your console? Instead of nice pretty printed output, even if you're using a pretty printer, you still get a long, ugly, unreadable string.

Today I finally got fed up enough to do something about it, and here is the solution:

Chuck this into the bottom of your config/initializers/devise.rb file and you're good to go. It removes the overriding of the inspect method that is the culprit.

Devise::Models::Authenticatable.remove_method(:inspect)

But Obie, what about Chesterton's Fence!?!?!

My answer is that if you're paranoid about the possibility of inspect being called by a logger while a plain-text password happens to be in scope, then by all means override the method instead of just removing it, but doing so is left as an exercise to the reader. (Hint: start overriding it and Github CoPilot will do the rest.)

Extracting JSON Code with Nested Curly Braces in Ruby (the long painful way around, with help from GPT4)

Given a text string that contains JSON code with possible nested curly braces, I needed to extract the outermost JSON code, including the curly braces. Here's an example of such text, which you may recognize as the output of an LLM (specifically GPT in this case):

Here's the JSON you requested:

{
 "title": "Brainstorming ideas",
 "summary": "The user discussed exporting basic profile bots",
 "sentiment": "positive",
 "language": "English",
 "additional_information": {
    "tags": ["brainstorming", "bots", "automation"]
  }
 }

An initial crack at extracting just the JSON with a regex might look like this, but the outermost selector is too greedy.

> text.match /.*(\{.*\})/m; $1
=> "{\n    \"tags\": [\"brainstorming\", \"bots\", \"automation\"]\n  }\n }"

I took a hard look at that regex and figured that I just needed to make the first .* less greedy for it to work fine, like this:

/.*?(\{.*\})/m

And here is where the story gets more interesting (or stupid, depending on your perspective.) Because I tried it and unfortunately it DID NOT work. And since it was late at night and I was tired, instead of trying to figure out why I got frustrated and asked GPT4 to help me out.

GPT offered me this regex instead: /(?<=\{)(?s:.*)(?=\})/m

I've occasionally had to use PERL over the years so I was at least passingly familiar with the syntax here. It uses a positive lookbehind (?<=\{) to assert that the match starts after an opening curly brace, and a positive lookahead (?=\}) to assert that the match ends before a closing curly brace. The (?s:.*) part matches any characters, including newlines, between the curly braces, and finally, the /m flag is used for multi-line matching. Which should have worked fine, except that I'm coding in Ruby, so it generated this exception: undefined group option

It turns out that Ruby's regex engine does not support these operations (or if it does, I was not able to figure it out).

When I pointed this out to GPT4 it proposed a more straightforward and brute-force approach by writing a Ruby method that loops through the text and counts the opening and closing curly braces. This method can efficiently find the starting and ending indices of the outermost JSON object.

Here's the Ruby code to extract the JSON code from the text:

def extract_json(text)
  start_index = text.index('{')
  return nil if start_index.nil?

  end_index = start_index
  brace_count = 1

  text[start_index + 1..-1].each_char.with_index do |char, index|
    brace_count += 1 if char == '{'
    brace_count -= 1 if char == '}'
    end_index += 1

    break if brace_count.zero?
  end

  text[start_index..end_index]
end

This code defines a method called extract_json that takes the input text as an argument. It then loops through the text to find the starting and ending indices of the outermost JSON object by counting the opening and closing curly braces. Once the outermost JSON object is found, it is returned as a string.

It passed my tests so I committed the code and called it a night. But that first failed regex solution still bothered me this morning, and eventually I opened a terminal and started playing around with it again. Making that first .* operator less greedy should work.

And guess what? It does work.

> text
=> "Here is your JSON:\n\n```\n{\n \"title\": \"Brainstorming ideas\",\n \"summary\": \"The user discussed exporting basic profile bots\",\n \"sentiment\": \"positive\",\n \"language\": \"English\",\n \"additional_information\": {\n    \"tags\": [\"brainstorming\", \"bots\", \"automation\"]\n  }\n }\n```\n"

> text.match /.*?(\{.*\})/m; $1
=> "{\n \"title\": \"Brainstorming ideas\",\n \"summary\": \"The user discussed exporting basic profile bots\",\n \"sentiment\": \"positive\",\n \"language\": \"English\",\n \"additional_information\": {\n    \"tags\": [\"brainstorming\", \"bots\", \"automation\"]\n  }\n }"

Why I couldn't get it to work yesterday? No idea. But I did learn a lesson about trusting GPT4 over my own instinct. I'm going to call it a double-edged sword. You got to be careful you don't cut yourself. This particular solution wasn't that big a deal, but I can imagine situations where it would be.

Interestingly enough, I pointed out the failure to GPT4 and it admitted that it made a mistake.

You are correct, and I apologize for my earlier explanation. The regex /.*?(\{.*\})/m indeed works for extracting the outermost JSON code, including when there are nested curly braces, as demonstrated in your example. The non-greedy .*? before the capturing group ensures that the regex starts matching at the first opening curly brace. The greedy .* inside the capturing group then matches all characters, including nested curly braces, until the last closing curly brace is found.

Thank you for pointing this out, and I apologize for the confusion in my earlier responses.

Smoothly Expanding Search Box with CSS Transitions and Tailwind CSS

Are you looking to create a search box that expands smoothly when it becomes active? You can achieve this effect using CSS transitions and Tailwind CSS.

To get started, create an HTML input element with the desired styling using Tailwind CSS classes. Then, add the transition-all and duration-500 classes to the input element to specify that all CSS properties should have a smooth transition effect over a duration of 500 milliseconds.

Finally, use the transform property to scale the width of the input element to the desired value when it becomes active. For example, you can use the focus:w-64 class to set the width to 64 Tailwind units when the input element is focused.

Here's the code for the HTML input element with the necessary Tailwind CSS classes:

<input type="text" placeholder="Search..." autocomplete="off"
       class="rounded-md bg-gray-400 border-gray-200
       border-2 text-gray-800 p-1 w-28 focus:bg-white
       focus:flex-1 focus:pr-12
       transition-all duration-500 transform focus:w-64">

By adding these classes, you should see a smooth transition effect as the search box expands when it becomes active. You can adjust the duration and transform values to customize the effect to your liking.

Quick Ruby client for Marqo (VectorDB)

Marqo is a cool vector DB that lets you store facts and then query them later with natural language. You can install Marqo using Docker, which I managed to do today on my M1 Mac. Just make sure you stop whatever ElasticSearch or OpenSearch instances you may already have running on your machine first, since Marqo wants to use its own.

Once you have it running, you can use the following Ruby class to access it in your Rails project.

require 'httparty'

class Marqo
  include HTTParty

  base_uri 'http://localhost:8882'

  def initialize(auth = { username: 'admin', password: 'admin' })
    @auth = auth
  end

  def store(index_name, name, detail)
    options = {
      headers: { 'Content-Type' => 'application/json' },
      body: [{name: name, detail: detail}].to_json
    }
    self.class.post("/indexes/#{index_name}/documents", options)
  end

  def search(index_name, query)
    options = {
      basic_auth: @auth,
      headers: { 'Content-Type' => 'application/json' },
      body: { q: query }.to_json
    }
    self.class.post("/indexes/#{index_name}/search", options)
  end

  def self.client
    @client ||= new
  end
end

The data from my personal TIL site is now migrated. Here's how I handled URL redirects in Cloudflare

To set up a subdomain redirect with Cloudflare while preserving the request path and parameters, you can use Page Rules.

In the Cloudflare dashboard for your domain, click on the Page Rules link in the Rules section of the main navigation menu on the left of the page.

Click on the Create Page Rule button to start configuring the redirect.

In the If the URL matches field, enter the pattern for the URLs you want to redirect, using a wildcard (*) to capture the path and query string. In this cae, the pattern is:

til.obiefernandez.com/*

Now click on Add a Setting and select Forwarding URL from the dropdown menu.

Select 301 - Permanent Redirect from the Status Code dropdown.

In the Enter destination URL field, enter the URL where you want to redirect your requests to, using the $1 placeholder to preserve the query string.

For example: https://til.magmalabs.io/$2

Click on Save and Deploy to create the Page Rule and enable the redirect.

Using OpenAI's TikToken in Ruby

The encoding used in the tiktoken library (and the Ruby binding discussed in this post) is a specific way of converting text into a sequence of tokens, which are then represented by their unique IDs. The encoding scheme is designed to work with OpenAI models like gpt-3.5-turbo and is based on the model's vocabulary and tokenizer.

There's a simple Ruby binding for TikToken made by Iapark that compiles the underlying Rust library. https://rubygems.org/gems/tiktoken_ruby

First add it to your Gemfile

gem "tiktoken_ruby"

Then use it in your code. The service module I wrote today to use it in my Rails app looks like this:

require 'tiktoken_ruby'

module TikToken
  extend self

  DEFAULT_MODEL = "gpt-3.5-turbo"

  def count(string, model: DEFAULT_MODEL)
    get_tokens(string, model: model).length
  end

  def get_tokens(string, model: DEFAULT_MODEL)
    encoding = Tiktoken.encoding_for_model(model)
    tokens = encoding.encode(string)
    tokens.map do |token|
      [token, encoding.decode([token])]
    end.to_h
  end
end

Here's what it looks like in practice.

irb> TikToken.count("Absence is to love what wind is to fire; it extinguishes the small, it inflames the great.")
=> 19

irb> TikToken.get_tokens("Absence is to love what wind is to fire; it extinguishes the small, it inflames the great.")
=>
{28878=>"Abs",
 768=>"ence",
 374=>" is",
 311=>" to",
 3021=>" love",
 1148=>" what",
 10160=>" wind",
 4027=>" fire",
 26=>";",
 433=>" it",
 56807=>" extingu",
 21168=>"ishes",
 279=>" the",
 2678=>" small",
 11=>",",
 4704=>" infl",
 986=>"ames",
 2294=>" great",
 13=>"."}

The encoding is essential for processing text with the OpenAI models, as it allows them to understand and generate text in a format that is compatible with their internal representations. In the context of the tiktoken library, the encoding is particularly helpful for estimating token counts in a text string without making an API call to OpenAI services.

Use isSameorAfter to set condition to select specific periods

momentjs is a JavaScript library that helps parse, validate, manipulate, and display date/time in JavaScript in a very easy way.

momentjs provide specific instructions to handle dates which can create conditions in each one, to select a period after and same that you needed, like a >= instruction use isSameorAfter:

moment('2010-10-20').isSameOrAfter('2010-10-19'); // true
moment('2010-10-20').isSameOrAfter('2010-10-20'); // true
moment('2010-10-20').isSameOrAfter('2010-10-21'); // false