monitoring

How to catch errors in production: setting up monitoring in 15 minutes

2025-11-20

#monitoring #sentry #errors #observability #javascript #python #devops

You deployed a new feature. Everything works perfectly on your local machine, and you’re happy with the result.
Then a message appears: “Nothing works for me.” You open the server logs — they’re empty. It turns out the error happened on the client side, from a user with an old browser version or unusual settings. And you might never have known about it.

This happens to almost everyone who deploys projects to production. It happened to me too, until I set up a tool that lets me see errors almost instantly — even if it’s the middle of the night and the problem occurred for a single user on the other side of the world.

Cheap and dirty: how to collect logs from remote clients over HTTP

2025-11-12

#logging #http #loki #vector #serverless #devops #monitoring #case-study

Do you have an application spread across hundreds of client devices? Or a fleet of IoT sensors sending telemetry? Sooner or later the question arises: “What’s actually happening over there?” And right after it — “How do I collect logs without bankrupting myself on Splunk or Datadog?”

If your clients can send HTTP requests, you already have ninety percent of the solution. HTTP(S) is a universal and firewall-friendly protocol. All we need is a listener (endpoint) that will accept these logs.

054 | VictoriaMetrics + Grafana: Efficient Time-Series Storage for Scalable Monitoring

2025-07-16

#VictoriaMetrics #Grafana #Prometheus #monitoring #time series #scalability #long-term storage #TSDB #open-source #DevOps

In our series on monitoring systems, we’ve reviewed Munin, Prometheus with Grafana, and Zabbix. Now it’s time to talk about a solution that addresses one of the main pain points of Prometheus users — long-term, scalable, and efficient time-series storage. Meet VictoriaMetrics, a high-performance and cost-effective TSDB (time-series database) that perfectly complements the Prometheus ecosystem when paired with Grafana for visualization.

What Is VictoriaMetrics and Why Do You Need It?

Prometheus handles real-time monitoring and storage well, but its built-in TSDB isn’t designed for long-term retention or scaling to terabytes or petabytes of data. That’s where VictoriaMetrics comes in.

053 | Zabbix Agent + Zabbix Server: All-in-One Monitoring Solution for Scalable Infrastructures

2025-07-15

#Zabbix #monitoring #all-in-one solution #alerts #templates #network discovery #scalability #open-source #DevOps

We’ve already looked at Munin for basic insights and Prometheus + Grafana for cloud environments. Now let’s turn to Zabbix — a powerful, versatile, and scalable monitoring system that offers a comprehensive out-of-the-box solution for medium and large infrastructures. Zabbix is often chosen by organizations needing centralized monitoring, flexible alerting, and a wide range of data collection methods.

What Is Zabbix and How Does It Work?

Zabbix is a mature open-source monitoring system designed to track the state and performance of various IT components: servers, virtual machines, network devices, databases, web services, and applications.

052 | Prometheus + Node Exporter + Grafana: The De Facto Standard for Cloud Environments

2025-07-14

#Prometheus #Grafana #Node Exporter #monitoring #cloud technologies #Kubernetes #DevOps #metrics #PromQL #alerts

We’ve reviewed Munin as a simple solution for basic monitoring. Now let’s move on to a stack that has become an indispensable tool in the world of modern cloud infrastructure, microservices, and containers: Prometheus, Node Exporter, and Grafana. This trio provides a powerful, flexible, and scalable approach to collecting, storing, analyzing, and visualizing metrics.

What is Prometheus and Its Ecosystem?

Prometheus is an open-source monitoring system originally developed at Google and later handed over to the Cloud Native Computing Foundation (CNCF). Its key feature is the “pull” model: Prometheus scrapes metrics from targets via HTTP endpoints.

051 | Munin: Simplicity and Clarity for Basic Monitoring

2025-07-13

#Munin #monitoring #graphs #RRDtool #system monitoring #simple solutions #open-source

After our introductory journey into the world of monitoring, it’s time to explore specific tools. Let’s start with one of the oldest yet still relevant solutions for those who value simplicity and clarity — Munin.

Munin is a lightweight and intuitive monitoring system specializing in collecting and graphically presenting system data. If you need a quick way to get a general view of your servers’ health without diving deep into complex configurations, Munin might be a great place to start.

050 | Why Do We Need Monitoring? Guarding the Stability of Your IT

2025-07-12

#monitoring #IT infrastructure #stability #performance #metrics #alerts #system monitoring #network monitoring #APM #Prometheus #Zabbix #Munin #VictoriaMetrics

In today’s world, where digital technologies penetrate every sphere of life, the stable operation of IT infrastructure is not just a desirable condition — it is a critical necessity. Whether it’s a small website, a large online store, a mobile application, or an internal corporate system — any failure can lead to serious losses, reputational damage, and user dissatisfaction. This is where monitoring steps in.

What is Monitoring and Why Is It Important?

Monitoring in IT is the continuous collection, analysis, and visualization of data about the state and performance of infrastructure, applications, and services. Imagine you have a complex machine, like a car. To keep it running smoothly, you regularly check fuel level, oil, tire pressure. Monitoring serves the same purpose for servers, databases, networks, and applications.

Monitoring and Observability

2025-01-20

#monitoring #observability #prometheus #grafana #elk #devops #infrastructure #landing

Professional Monitoring and Observability Setup

I configure comprehensive monitoring and observability systems for full visibility of your infrastructure. With 20+ years of DevOps experience, I help startups gain full control over system status.

What You’ll Get

Metrics and monitoring:

Metrics collection from applications and infrastructure
Visualization in Grafana dashboards
Critical event alerts configuration

Logging:

Centralized log collection
Search and analysis in ELK Stack or Loki
Log storage and rotation

Tracing:

Incident Management and Response

2025-01-20

#incident-response #devops #monitoring #security #reliability #landing

Professional Incident Management Setup

I configure comprehensive incident response processes for fast recovery and impact minimization. With 20+ years of DevOps experience, I help startups effectively handle incidents.

What You’ll Get

Response processes:

Clear detection and escalation procedures
Team roles and responsibilities
Runbooks for typical incidents

Automation:

Automatic alerts and notifications
Automatic recovery where possible
Integration with monitoring systems

Training and improvement:

Team training on procedures
Post-mortem analysis
Continuous process improvement

Why Choose Me

20+ years of experience in incident handling
Broad expertise in process building
Focus on startups — understand limitations and needs
Fast results — working processes in a week

Cloud Migration

2025-01-20

#cloud-migration #aws #azure #gcp #yandex-cloud #alibaba-cloud #devops #infrastructure #migration #security #monitoring #landing

Professional Cloud Migration for Your Startup

I conduct complete migration cycles of infrastructure to the cloud (AWS, Azure, GCP, Yandex Cloud, Alibaba Cloud) with focus on security, reliability, and cost optimization. With 20+ years of DevOps experience, I help startups successfully transition to the cloud.

What You’ll Get

Planning and analysis:

Detailed analysis of current infrastructure
Optimal cloud provider selection
Migration plan with risk minimization

Migration:

Safe migration of applications and data
Minimal or zero downtime
Security and monitoring setup
Testing and validation

Optimization: