Site Reliability ინჟინერი
2xP PRIORITY TAZE Technologies
თარიღები: 11 ივლ - 09 აგვ
მდებარეობა: თბილისი
დასაქმების ფორმა: სრული განაკვეთი
მივლინება: არა
ენები: ინგლისური
ინფორმაციული ტექნოლოგიები:  ინფორმაციული ტექნოლოგიები (ზოგადი)
ელ.ფოსტა გამოხმაურებისთვის: recruitment@tazetec.com

Are you looking for a space for innovation, diversity, and self-development? Then Tazetec has the right challenge for you!

is looking for Site Reliability Engineer (SRE)


Your profile:

  • Experience with AWS or hybrid data center setups;
  • Reading logs and stacktraces to determine the root cause of the incident;
  • Infrastructure as Code: Terraform, Helm, Ansible, (optional) Werf;
  • Linux administration and container orchestration (K8s) skills;
  • Experience with monitoring/observability stacks: Prometheus, Grafana, ELK, Loki, etc.;
  • Strong understanding of TCP/IP, DNS and load balancers;
  • Familiarity with incident response, postmortems, and blameless culture;
  • Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00.


Bonus points for:

  • Background in high-throughput environments (e.g., financial, trading, etc.);
  • Experience with CDNs, and real-time log aggregation;
  • Proficiency in one or more scripting languages (Python, Bash, Go);
  • Knowledge of Java, PHP with their respective web-development frameworks;
  • Hands-on experience with MSSQL, PostgreSQL, MongoDB, etc;
  • Exposure to Kafka, Redis or other event-driven systems.


Your day-to-day contribution:

  • Maintain and improve SLA/SLO/SLI metrics for critical systems (e.g. KYC, payments);
  • Manage and support highly available, scalable infrastructure (K8s, cloud and bare metal);
  • Implement and manage monitoring, logging, and alerting (e.g., Prometheus, Grafana, Loki, ELK);
  • Automate deployments and operations using CI/CD pipelines (Jenkins, ArgoCD, Helm, etc.);
  • Conduct post-incident reviews, define action items, and reduce mean time to recovery (MTTR);
  • Participate in on-call rotation to ensure 24/7 system reliability;
  • Secure infrastructure in line with regulations (e.g., data integrity, jurisdictional compliance);
  • Collaborate with Dev, QA, DevOps and Ops to improve services stability and uptime.


Success Metrics:

  • < 1% downtime for any user-/partner-facing services;
  • SLO 99.95%;
  • 95% of infrastructure managed via code and automation;
  • Documented runbooks and alert playbooks per service group.


What's next:

  • I step: HR interview (30 minutes);
  • II step: technical interview (1 hour);
  • III step: introduction to the team (1 hour);
  • IV step: final decision.


Our benefits package:

  • Inspiring and diverse culture surrounded by experienced and enthusiastic colleagues;
  • Flexible working hours to ensure your work-life balance;
  • Ability to choose to work remotely or at the office;
  • Star players/Top Salary policy;
  • Paid vacation days;
  • An attractive package of medical insurance;
  • Inspiring and diverse culture surrounded by experienced and enthusiastic colleagues;
  • Opportunities for personal and professional development: monthly compensation for your leisure activities, fully paid courses and workshops, etc;
  • An active corporate life: team-buildings, sports activities, corporate parties, etc;
  • A collaborative and welcoming environment for your initiatives.


Are you ready to take on this challenge? Send your English CV to recruitment@tazetec.com