Prevent Outages and Debug Faster with Robust Logging
Logging is an essential yet often underestimated practice in early-stage app development, particularly among startups and smaller teams. In the bid to push more features and "move fast and break things", many teams will run into the gnarly complexities down the road when things eventually do break; leading to costly debugging sessions, unexplained errors and compliance headaches down the road. Being one of the pillars of observability, logs provide essential insights into the application's behavior; they help recreate scenarios that led to errors and help intercept them before they can become full-blown outages. Even a simple (yet clean) implementation of logging can go a long way in preventive maintenance.
This blog discusses an effective example of implementing logging using a Nuxt 3, which uses Nitro under the hood for its server, and MongoDB stack. Using plugins, middleware and mongoose, this stack allows for an automated and easy-to-integrate system of logging. This Nuxt module enables this. This is a general flowchart of how the logging and auditing system works.
Whenever an authorized user makes a request to the server, it will trigger a chain of events in multiple services. For instance, updating a user profile might touch the authentication service, database, and notification system. It's important to log all the changes and events across the request's journey. And if an error occurs in any of the steps, or their performance needs to be monitored, it would be important to correlate all the logs together with the request made, and thus paint a picture of the request's complete lifecycle. This can be achieved simply by issuing a unique identifier to the request, a request_id, along with additional information like session_id, request_path and including it within each log.
This simplifies things considerably; when a user reports an error, instead of sifting through a mountain of logs it's simply a matter of using the request ID to get all the relevant logs and thus zoom in on the exact journey of their request. Request IDs enable more than just that; being able to associate a bunch of logs to their respective request allows for making insights into how the user interacts with the app, along with allowing for aggregating logs of a specific request type and thus monitoring its performance, error rate, etc.
Manually capturing and storing this data for every possible API can be cumbersome. Though in this implementation, this is automated using a mongoose plugin, which sets and updates fields on all models for essential metadata. The request_id and session_id of the response is recorded in fields like _last_request, _last_session and are stored on the changed documents themselves originally. Along with this, field such as _created_at, _created_by, _updated_at, _updated_by exist on the mongoose document. These promote consistency and enable granular audit trails.
There are the fields _deleted_at and _deleted_by. Their purpose is to enable soft deletion; simply put, whenever a document needs to be "deleted", these respective fields are filled out. This allows for preserving historical data which may be useful in the future. Thus, whenever other fields are fetched, _deleted_at being undefined is also queried against to check for whether data is valid.
Now that most of the required metadata has been recorded on the documents, these documents themselves are stored within the logs whenever there's a change or a new document is created. This essentially records the history of a document, allowing one to be able to review the entire timeline of changes the document may undergo since its creation, including data of which requests (and thus, which users) have made actions that resulted in these changes. And with that, changes to the data in the database is recorded. This however comprises only part of the logs the logger will store.
Another type of data that the logger seeks to store are event logs, which record state changes of the MongoDB change stream itself. It's important to monitor the health and functioning of the database being used, and event logs offer crucial insights that allow for debugging potential stream interruptions and tracking operational events.
These logs are typically brief and mostly capture the stream event and timestamp, they look something like this:
The primary events are:
Stream Lifecycle Events: These are events that mark significant stages in the existence of a system, software component, or data stream. For example:
'init': Indicates the initial opening of the change stream
'close', 'end': Signal the closing of the stream, with 'end' being a normal shutdown
'error': Points to potential issues in the stream's operation
Operational Events: These are events that reflect the ongoing actions, interactions, or changes within a running system. For example:
'more': Indicates that the change stream has a batch of document changes ready for your application to retrieve and process
'response': Represents the server's response to various requests your application makes to manage the change stream
'resumeTokenChanged': Resuming the change stream after an interruption, ensuring that no data changes are missed.
This helps accomplish several things. It allows for tracking stream state transitions, identify potential bottlenecks, and gain insights into overall database health. For instance, if the application stops receiving updates, event logs can reveal if the stream closed unexpectedly or encountered errors, with 'close', 'end', or 'error' events providing immediate clues to investigate.
With that, an automated and effective logger is in place, ideal for small to medium-sized applications that startups and small teams develop. This blog provides a solid foundation and advanced logging concepts exist: for eg. log aggregation, centralized log analysis tools, or setting up alerts for critical events, which will be needed once the application scales. Meanwhile, recording change history and key events goes a long way towards proactive maintenance. This level of insight is the difference between hours of frustrated debugging and swift problem resolution, and it all starts with robust logging!