Share this Job

System Reliability Engineer / Ingénieur Fiabiliste

Date: Aug 3, 2022

Location: Montreal, Quebec, CA

Company: WorkJam, Inc.

WorkJam’s mission? To provide the best Digital Workplace for frontline and hourly workers. Through our industry-leading Digital Frontline Workplace platform, we are positively impacting the lives of millions of frontline employees worldwide, enabling them to achieve breakthrough productivity levels at companies of all sizes. We’re proud of our dedicated teams who are driven to make a difference in the world. Join our team today and bring your innovative ideas, passion, and commitment to excellence to make an impact on our products and the new markets we create!

 

WorkJam is a high growth global organisation with operations in North America, Europe, and Australia with our head office based in Montreal. Learn more about WorkJam at WorkJam.com!

Summary

Your role as a System Reliability Engineer/Specialist (SRE) 
The SRE is responsible to closely collaborate with the dev teams and dev/ops with a focus on reliability, scalability, resilience, security, and performance. Will be responsible to centralize monitoring activities and proactively bring solutions to improve the overall application. Sharing the DevOps objectives the SRE will work within the Dev organization.  

 

What you will be doing:

  • Collaborate with Dev, DevOps, Release and QA teams to ensure proactive detection of unwanted behaviors in the application 
  • Serve as a primary point who is responsible for the overall health, performance, and capacity of our platform. 
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding 
  • Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and growth. 
  • Work closely with development teams to ensure the platform is designed with operability in mind. 
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve 
  • Participate in an on-call rotation. 
  • Perform root cause analysis and document results in the form of post-mortems. 
  • Identify and lead efforts to improve automation. 

 

What we're looking for:

  • Bachelor’s Degree in a technical field (Software Engineering or related fields).  
  • Good knowledge of visualization and monitoring tools like Prometheus, New Relic, Firebase, Grafana 
  • 3+ years Hands-on experience operating Kubernetes clusters in a production environment 
  • Understanding of the Linux Operating System, standard networking protocols, and components. 
  • Experience in managing and scaling distributed systems in one of the three major cloud providers (AWS, GCP). 
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks. 
  • Automation/Scripting experience with Shell, Python or something similar. 
  • Familiarity with Infrastructure as Code (IaC) tools (Kubernetes Helm Charts, Terraform, etc.). 
  • Strong Java programing experience. 

What we offer:

  • Competitive salary and benefits package
  • 4 weeks’ vacation
  • Contribution to your retirement/pension plan
  • A flexible and remote/hybrid work environment
  • Work with the latest technology
  • A dynamic and inclusive culture
  • A supportive team that will encourage your professional growth and development

Ce que nous offrons:

• Salaire et avantages sociaux compétitifs

• 4 semaines de vacances

• Contribution à votre régime de retraite/pension

• Un environnement de travail flexible et à distance/hybride

• Travaillez avec les technologies les plus récentes

• Une culture dynamique et inclusive

• Une équipe solidaire qui encouragera votre croissance et votre développement professionnels