Not done with Go just yet

Just because you can do something is enough reason to do it

Posted by Art Mills

April 19, 2024

I'm just looking at you, Python.

I find myself still weirdly drawn to Go. Something about it is different enough and interesting enough to make me want to solution with it. Python is great. Django is a miracle. I will, no question about it, use Django and Python to spike out web-based solutions for fun and for profit (but yes mostly for fun).

But Go still whispers to me. Or maybe that's the bourbon. I'll assume it's Go.

I'm evaluating very large storage accounts at work. Accounts with over a million containers. Accounts with, all told, over a billion blobs. We're just evaluating things to see what's where and what should be there.

Running a Python script to jump out to Azure and just get a count of 1.1 million containers in one storage account is roughly 5.34.

Running a Go script to do the same is roughly 4.20. That's without looking at the underlying blobs. So I'm going to use Go for blobs too because, hey, faster.

So, I'm trying to do this and failing miserably.

I'm over at the azblob package for go and am clearly following the obvious directions:

Example (Client_NewListBlobsPager)

Simple enough, but, nope. Go won't compile. There is no method called NewListBlobsPager in the azblob package. I am READING THE MANUAL. Still, I have to trust Go on this. And, whoops:

pager := client.NewListBlobsFlatPager("testcontainer", &azblob.ListBlobsFlatOptions{
    Include: container.ListBlobsInclude{Deleted: true, Versions: true},
})

for pager.More() {
    resp, err := pager.NextPage(ctx)
    handleError(err) // if err is not nil, break the loop.
    for _, _blob := range resp.Segment.BlobItems {
        fmt.Printf("%v", _blob.Name)
    }
}

It's really NewListBlobsFlatPager in spite the top level documentation saying different. I hate you.

Anyway, if you haven't already, and want to, here's how you can get a count that works :)

    // Initialize the pager for listing containers
    pager := client.NewListContainersPager(&azblob.ListContainersOptions{
        Include: azblob.ListContainersInclude{Metadata: true, Deleted: false},
    })

    // Count the containers and empty containers
    containerCount := 0
    emptyContainerCount := 0
    for pager.More() {
        resp, err := pager.NextPage(ctx)
        if err != nil {
            fmt.Printf("Error getting next page for URL %s: %v\n", url, err)
            break // Exit the loop if there's an error getting the next page
        }
        containerCount += len(resp.ContainerItems)

        // Check each container for blobs
        for _, containerItem := range resp.ContainerItems {
            maxResults := int32(1) // Request only one blob to minimize data retrieval
            blobPager := client.NewListBlobsFlatPager(*containerItem.Name, &azblob.ListBlobsFlatOptions{
                Include:    azblob.ListBlobsInclude{Deleted: false},
                MaxResults: &maxResults,
            })

            // Fetch the first page of the blob listing
            blobResp, err := blobPager.NextPage(ctx)
            if err != nil {
                fmt.Printf("Error checking blobs in container %s: %v\n", *containerItem.Name, err)
                continue // Skip to next container on error
            }

            // Check if the page has any blobs
            if len(blobResp.Segment.BlobItems) == 0 {
                emptyContainerCount++
            }
        }
    }

    return containerCount, emptyContainerCount
}

This script just gets a count of containers in total and runs using Goroutines to run in parallel. Then it fetches one (one ping only) blob and if none comes back adds that to the containers with no blobs count. I couldn't find a faster way to get a 0 count container as len would take forever on every container with lots of blobs and !blobPager.More() simply didn't work.

But this does.

I'll write it in Python too because it's just the right thing to do.