Full-Time
Posted on 8/12/2025
Open-source dashboards and cloud observability platform
CA$164.9k - CA$197.4k/yr
Remote in Canada
Remote
Grafana Labs builds observability and monitoring tools for cloud infrastructure and applications. Its flagship Grafana dashboard lets users visualize data from many sources in real time and set up alerts, with additional options like Grafana Enterprise for large deployments and Grafana Cloud as a managed service. The core open-source platform is complemented by commercial features and services that provide security, scalability, and dedicated support, appealing to both individual developers and large organizations. The goal is to help businesses keep digital services reliable and efficient by delivering scalable, real-time visibility into software and infrastructure.
Company Size
1,001-5,000
Company Stage
Series D
Total Funding
$805.2M
Headquarters
New York City, New York
Founded
2014
Help us improve and share your feedback! Did you find this helpful?
30 days of paid vacation each year on top of national holidays, parental leave, & sick leave
Health coverage
4% contribution match on our 401(k)
$1,500 learning and development stipend
Udemy subscription
Complimentary subscription to Headspace
Discounts on a wide variety of services, including entertainment, food, and fitness.
Remote Work Option
Global Employee Assistance Program
Grafana Labs is hosting an Observability Sessions event in Santiago on 15 April, bringing AI-powered observability solutions to Latin America. The company has grown its local team by 30% over two years to meet rising demand from regional organisations including LATAM Airlines, Casas Bahia and Hona. The event will feature technical sessions and demonstrations of Grafana Assistant, now generally available in Grafana Cloud, which allows users to query observability data in plain language. Assistant Investigations, currently in public preview, acts as an autonomous agent coordinating across metrics, logs, traces and profiles to identify root causes during incidents. According to Grafana Labs' 2026 Observability Survey, over 61% of South American respondents identified root cause analysis as AI's highest potential value area in observability.
GrafanaGhost: attackers can abuse Grafana to leak Enterprise data. 2026-04-07 16:04 By targeting Grafana's AI components, attackers can point to external resources and inject indirect prompts to bypass safeguards. Read the original article: Grafana Labs has disclosed a critical security vulnerability affecting Grafana Enterprise that could allow attackers to escalate privileges and impersonate users. The flaw, tracked as CVE-2025-41115, has received the maximum CVSS score of 10.0, making it one of the most severe vulnerabilities discovered in recent times. The vulnerability exists in Grafana's... November 21, 2025 In "Cyber Security News" Computer Security Grafana Labs has released critical security patches addressing a severe vulnerability in its SCIM provisioning feature that could allow attackers to escalate privileges or impersonate users. The flaw, tracked as CVE-2025-41115 with a CVSS score of 10.0 (Critical), affects Grafana Enterprise versions 12.0.0 through 12.2.1 under specific configurations. Organizations using... November 21, 2025 A high-severity cross-site scripting (XSS) vulnerability in Grafana could allow attackers to redirect users to malicious websites. The vulnerability, tracked as CVE-2025-4123 received a CVSS score of 7.6 (HIGH), allows attackers to exploit client path traversal and open redirect to execute arbitrary JavaScript code through custom frontend plugins. The vulnerability... May 22, 2025 In "Cyber Security News"
New Relic vs Grafana: which monitoring stack in 2026? An honest comparison of New Relic (managed SaaS, per-user pricing) and Grafana (open-source, self-hosted or cloud). Pricing, features, learning curve, and when neither fits your needs. New Relic and Grafana represent two fundamentally different approaches to monitoring. New Relic says: "Here is a complete platform. Send us your data and we handle everything." Grafana says: "Here are the building blocks. Assemble the stack that fits your needs." Both approaches work. The question is which one fits your team size, budget, technical capacity, and tolerance for operational overhead. New Relic: the Managed SaaS platform. New Relic is a fully managed observability platform. You install their agent, it collects metrics, traces, logs, and errors, and everything appears in a single web interface. No infrastructure to manage. No databases to run. No configuration files to maintain. What you get. * APM (Application Performance Monitoring): Auto-instrumentation for most languages and frameworks. Transaction traces, slow query analysis, error analytics. * Infrastructure monitoring: Host metrics, container metrics, Kubernetes monitoring. Integrations with 750+ technologies. * Log management: Ingest, search, and analyze logs. Correlate logs with traces and errors. * Distributed tracing: End-to-end request tracing across services. * Synthetics: Uptime monitoring with scripted browser checks. * Alerts: Threshold-based and anomaly detection alerting with PagerDuty, Slack, email integrations. * NRQL (New Relic Query Language): SQL-like language for querying all your telemetry data. Powerful but proprietary. Pricing (2026). New Relic changed to a user-based pricing model: * Free tier: 1 full-platform user, 100GB/month data ingest, forever free. * Standard: $49/user/month, 100GB free then $0.35/GB. * Pro: $349/user/month, advanced features. * Enterprise: Custom pricing, HIPAA compliance, SSO. The per-user model is a double-edged sword. For a solo developer or small team of 2-3, the free tier is genuinely generous - 100GB is enough for most applications. For a team of 15 engineers, the cost is $735/month on Standard before data charges. That adds up fast. Strengths. * Zero infrastructure to manage. Install agent, see data. * One platform for everything. No tool integration headaches. * NRQL is genuinely powerful for ad-hoc queries. * Free tier is production-ready, not a trial. Weaknesses. * Per-user pricing gets expensive for growing teams. * Vendor lock-in. NRQL, custom instrumentation, dashboards - all proprietary. * Data ingest costs are unpredictable. A noisy microservice can blow your budget. * UI can be overwhelming. There are so many features that finding what you need takes time. Grafana: the open-source stack. Grafana is not a single product - it is a ecosystem. At its core, Grafana is a visualization and dashboarding tool. But a complete monitoring stack typically includes: * Grafana: Dashboards, alerting, and visualization. * Prometheus: Metrics collection and storage (time-series database). * Loki: Log aggregation (like a lightweight ELK). * Tempo: Distributed tracing (stores traces in object storage). * Mimir: Long-term metrics storage (Prometheus-compatible). * Alloy (formerly Grafana Agent): Telemetry collector that ships data to all of the above. Self-hosted vs Grafana Cloud. Self-hosted: All components are open-source. You run them on your own infrastructure. Free, but you are responsible for uptime, scaling, backups, and upgrades. Grafana Cloud: Managed version of the entire stack. Free tier includes: * 10,000 active metrics series * 50GB logs/month * 50GB traces/month * 500 VUH (virtual user hours) for k6 load testing * 50GB profiles/month Paid plans start at $29/month for Grafana Cloud Pro, scaling based on usage. * No vendor lock-in. Prometheus, OpenTelemetry, and PromQL are industry standards. * Extremely flexible. You can build exactly the stack you need. * Beautiful dashboards. Grafana's visualization is best-in-class. * Massive community. Thousands of pre-built dashboards, exporters, and integrations. * Cost-efficient at scale. Open-source components mean you pay for infrastructure, not licenses. * Operational overhead. Running Prometheus + Loki + Tempo + Grafana is a lot of infrastructure. * Steeper learning curve. PromQL, LogQL, and TraceQL are three different query languages. * Assembly required. New Relic works out of the box. Grafana requires configuration, integration, and ongoing maintenance. * Alerting is functional but not as sophisticated as dedicated tools like PagerDuty or OpsGenie. Head-to-Head comparison. | Dimension | New Relic | Grafana (Stack) | | Type | Managed SaaS | Open-source / Managed Cloud | | Setup time | Minutes (install agent) | Hours to days (self-hosted) / Minutes (Cloud) | | Infrastructure management | None | Significant (self-hosted) / None (Cloud) | | APM | Built-in, auto-instrumented | Via Tempo + OpenTelemetry | | Logs | Built-in | Loki | | Dashboards | Good | Best-in-class | | Query language | NRQL (proprietary) | PromQL, LogQL, TraceQL (open standards) | | Vendor lock-in | High | Low (open standards) | | Free tier | 1 user, 100GB/month | 10K metrics series, 50GB logs/month | | Paid pricing | Per user ($49+/user/month) | Per usage (metrics, logs, traces) | | Best for | Teams wanting zero ops | Teams wanting flexibility and control | When to choose New Relic. * Your team has 1-3 engineers and you do not want to manage monitoring infrastructure. * You need APM, logs, and infrastructure in one place with zero setup. * You value simplicity over flexibility. * Your data volume is under 100GB/month (free tier is genuinely useful). When to choose Grafana. * You already run Kubernetes and your team is comfortable with Prometheus. * You want to avoid vendor lock-in and use open standards. * You need highly customized dashboards and visualizations. * You have the DevOps capacity to manage the stack (or use Grafana Cloud). * You are cost-sensitive at scale - open-source scales cheaper than per-user pricing. When neither fits. Both New Relic and Grafana are designed for teams operating infrastructure. They assume you have servers, containers, or at least a multi-service architecture worth monitoring. But many Next.js applications are deployed on Vercel or similar platforms where you do not manage infrastructure. You do not have hosts to monitor. You do not have Prometheus endpoints to scrape. What you have is API routes that need to be fast, reliable, and monitored. For this scenario: * New Relic's agent-based approach does not work on serverless without significant configuration. * Grafana's Prometheus-based stack has nothing to scrape in a serverless environment. Nurbak Watch is built for this gap. It runs inside your Next.js server via instrumentation.ts - five lines of code - and monitors every API route from the inside. No agents, no exporters, no infrastructure. Alerts via Slack, email, or WhatsApp in under 10 seconds. $29/month flat, free during beta. If you grow into managing your own infrastructure, New Relic or Grafana will be there. Start with what your architecture actually needs. The Nurbak Team builds developer-first API monitoring tools. Nurbak share insights on uptime, performance, alerting, and best practices for keeping APIs healthy in production. Ready to try it? Nurbak Watch is free during beta. 5 lines of code. First alert in under 5 minutes. Comparisons
Eliminating static waste: automating capacity management with Dynamic Buffers. * March 26, 2026 * Samuel Good As your multiplayer game scales, relying solely on fixed buffer sizes to manage unpredictable player spikes can turn off-peak hours into a costly operational blind spot. Reserving permanent spare cloud capacity for these sudden surges means paying for idle compute when player traffic drops. This provisioning model inflates your infrastructure budget, effectively acting as a tax for phantom workloads that sit empty waiting for traffic. GameFabric eliminates this idle-capacity tax through intelligent, automated orchestration. GameFabric operates on a synergistic hybrid model: your predictable player concurrency runs on highly performant, cost-efficient bare metal, while unpredictable player surges burst automatically into the cloud. Dynamic Buffers enforce this philosophy by intelligently scaling your elastic cloud servers while keeping your core bare metal servers ready. By concentrating scaling actions strictly on the cloud tier, GameFabric preserve the dedicated nature of your bare metal foundation. To achieve true cost optimization without compromising the player experience, infrastructure requires intelligent, demand-based scaling. Dynamic Buffers replace static waste with automated capacity management, dynamically growing and shrinking your buffer sizes in direct correlation with player demand. Dynamic Buffering is not a blunt, centralized trigger; it's a sophisticated approach designed specifically for autonomous container orchestration within your Armada and ArmadaSet deployments. * Cluster-Level Autonomy: Unlike autoscalers that rely on a central API, GameFabric's dynamic scaling logic operates directly inside your game cluster. This localized architecture removes single points of failure. Even in the event of an API disruption, your game servers continue scaling flawlessly to meet real-time player demand. * The Fallback Safety Net: Dynamic Buffers are underpinned by a secure 'floor.' Whether you define a static manual buffer or allow its system to calculate a safe baseline, this safety net ensures your game remains highly available even if the dynamic system is overridden. While the system automates capacity management, your engineering team retains absolute control over scaling behavior. Through the GameFabric UI, LiveOps teams can use a simple slider to dictate the system's priorities: * Cost Efficient: Configures a leaner infrastructure to maximize savings during stable periods. * Availability: Configures the orchestrator to scale up fast and scale down slow, maintaining a robust buffer to handle massive player influxes safely and ensuring players aren't left waiting for new servers to spin up. Effective fleet management demands clear visibility into system behavior. Because GameFabric integrates natively with Prometheus and Grafana, your LiveOps team can utilize customized dashboards to visualize buffer adjustments over time, correlating scaling events directly with Concurrent User (CCU) demand. Should you need to lock your infrastructure for a specific live event, the system features a manual override. Disabling Dynamic Buffers instantly halts automated tuning, safely reverting the fleet to your statically defined fallback buffer without interrupting active game sessions. Reclaim your cloud budget and offload the operational overhead of manual fleet management to an orchestrator built specifically for the realities of live-service gaming. Reach out today for your personalized demo to see automated capacity management in action.
Grafana security release: critical and high severity security fixes for CVE-2026-27876 and CVE-2026-27880. 2026-03-25 - 5 min Today Grafana Labs is releasing Grafana 12.4.2 along with patches for Grafana 12.3, 12.2, 12.1, and 11.6, which include critical and high severity security fixes. Grafana Labs recommend that you install the newly released versions as soon as possible. Grafana 12.4.2 with security fixes: Grafana 12.3.6 with security fixes: Grafana 12.2.8 with security fixes: Grafana 12.1.10 with security fixes: Grafana 11.6.14 with security fixes: As per its security policy, Grafana Labs customers have received security patched versions two weeks in advance under embargo, and Grafana Cloud has been patched. Grafana Labs has also coordinated closely with all cloud providers licensed to offer Grafana Cloud. They received early notification under embargo and confirmed that their offerings are secure at the time of this announcement. This is applicable to Amazon Managed Grafana and Azure Managed Grafana. CVE-2026-27876: SQL expressions arbitrary file write enabling remote code execution. Grafana's SQL expressions feature enables transforming query data with familiar SQL syntax. This syntax, however, also permitted writing arbitrary files to the file system in such a way that one could chain several attack vectors to achieve remote code execution. The CVSS score for this vulnerability is 9.1 CRITICAL (CVSS link). The following prerequisites are required for this vulnerability: * Access to execute data source queries (Viewer permissions or higher) * The sqlExpressions feature toggle must be enabled on the Grafana instance. Impact. An attacker with access to execute data source queries could overwrite a Sqlyze driver or write an AWS data source configuration file in order to achieve full remote code execution. Grafana Labs has confirmed this vulnerability could be exploited to acquire an SSH connection to the Grafana host. Impacted versions. Grafana versions v11.6.0 and later are impacted by this vulnerability. Solutions and mitigations. If an upgrade is not immediately possible, the following workarounds reduce risk. Note: these may cause disruption to Grafana users and do not fully remediate the vulnerability. Option 1: Disable the sqlExpressions feature toggle. Option 2: Perform ALL of the following: * If you have Sqlyze installed: update to at least v1.5.0 or disable it. * Disable all AWS data sources you have installed. CVE-2026-27880: unauthenticated denial-of-service via OpenFeature endpoint. Grafana's OpenFeature feature flag validation endpoints do not require authentication and accept unbounded user input. This input is read into memory. The CVSS score for this vulnerability is 7.5 HIGH (CVSS link). An attacker could crash the Grafana server by sending requests that exhaust available memory. Grafana versions v12.1.0 and later are impacted by this vulnerability. If an upgrade is not immediately possible, any of the following workarounds reduces risk: * Deploy Grafana in a highly available environment with automatic restarts. * Implement a reverse proxy in front of Grafana that limits input payload size. Cloudflare does this by default. Nginx supports this via explicit configuration. Timeline and post-incident review. Here is a detailed incident timeline. All times are in UTC. CVE-2026-27876 | Date/Time (UTC) | Event | | 2025-02-06 | sqlExpressions feature reimplemented with MySQL syntax and released in v11.6.0 | | 2026-02-23 13:33 | Internal incident declared | | 2026-02-23 15:08 | Grafana Cloud patched | | 2026-03-09 | Private release issued to customers under embargo | | 2026-03-25 | Public release | | 2026-03-26 04:00 | Blog published | CVE-2026-27880 | Date/Time (UTC) | Event | | 2025-06-27 | New OpenFeature evaluation endpoint introduced and released in v12.1.0 | | 2026-02-24 13:12 | Internal incident declared | | 2026-02-24 17:49 | Grafana Cloud stacks not behind Cloudflare were patched; Cloudflare-backed stacks were not affected | | 2026-03-09 | Private release issued to customers under embargo | | 2026-03-25 | Public release | | 2026-03-26 04:00 | Blog published | Acknowledgements. Grafana Labs would like to thank Liad Eliyahu, Head of Research at Miggo Security, for responsibly disclosing CVE-2026-27876 through its bug bounty program. CVE-2026-27880 was discovered internally by the Grafana Labs security team. Reporting security issues. If you think you have found a security vulnerability, please go to its Report a security issue page to learn how to send a security report. Grafana Labs will send you a response indicating the next steps in handling your report. After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance. Important: Grafana Labs ask you to not disclose the vulnerability before it has been fixed and announced, unless you received a response from the Grafana Labs security team that you can do so. Security announcements. Grafana Labs maintain a security advisories page, where Grafana Labs always post a summary, remediation, and mitigation details for any patch containing security fixes. You can also subscribe to its RSS feed.