Security Operations in an AWS Environment

Grammarly’s AI-powered writing assistant helps 30 million people write more clearly and effectively every day—and the security of all our users’ data, along with the reliability of our services, are our highest priorities.

We use Amazon Web Services (AWS), the industry-leading secure cloud platform, to host our server-side infrastructure. Security is a shared responsibility: AWS is responsible for the security of the cloud, while Grammarly is responsible for security in the cloud—in other words, the secure configuration and management of our services, applications, and data that live in AWS.

Security operations, or SecOps, covers the following areas:

1 Identification: Understanding our infrastructure, tooling, and risk vectors; for example, to protect user data, we first need to identify how the data is transferred, stored, and processed.

2 Protection: Applying the right controls and configurations to shield Grammarly’s infrastructure from attacks before they happen.

3 Detection: Establishing continuous monitoring to surface any security incidents, and examining incidents to determine the root cause and impact.

4 Response: Taking immediate action to stop any attack, such as blocking IP addresses or resetting compromised passwords.

5 Restore: Returning the system to a secure state and recreating any lost data following an incident.

In this article, we’ll take a look at how Grammarly uses AWS tools in our security operations. While we also rely on many other third-party tools and evaluations in our processes, we hope that this focus will be useful to other teams making decisions about how to manage their security operations with AWS. The vast number of tools can be a bit overwhelming!

Shape the way millions of people communicate!

Identification

Grammarly has a range of client applications: browser extensions, the Grammarly Editor for all major browsers, native desktop apps, Grammarly for Microsoft Office, the Grammarly Keyboard for iOS and Android, and Grammarly for iPad. These clients access our server-side infrastructure on AWS through only a small number of public servers and ports, protected by a load-balanced web application firewall. All components that process user data operate inside our private network. In all, we have over 50 AWS accounts that are used to host more than 5,000 servers. Our AWS locations are the US East and US West regions.

We use several internal and external processes to evaluate risks on a regular basis, and the SecOps team is committed to a high response efficiency for triaging and resolving any vulnerabilities we discover. While a complete discussion of our risk evaluation is out of scope for this article, it’s worth mentioning that we run a public bug bounty program with HackerOne—you can view more information here if interested.

Protection

Data protection

It’s vital for Grammarly to protect all text that users write while using our product and store in their Grammarly Editor. We conduct regular inventories of our system architecture and data flow to define which databases, S3 buckets, and EBS volumes contain sensitive data, and we verify that encryption is applied by design from the first time a user interacts with Grammarly. The AWS Key Management Service (KMS) and Certificates Manager make it possible to enforce the strong encryption of data at rest and in transit.

Furthermore, access to this data must be controlled with high granularity and restricted only to engineers who need it. AWS IAM is a mechanism for creating and managing users, groups, and permissions. This enables us to mandate the use of strong passwords and multi-factor authentication for Grammarly team members, and to control access to user data following the principle of least privilege. The AWS Secrets Manager helps manage, retrieve, and audit access to encryption keys, database credentials, API keys, and other secrets used in our infrastructure. Following best practices, it automatically rotates these secrets according to a defined policy.

Application protection

Furthermore, we need to protect our infrastructure from denial of service (DoS) attacks so that customers can always depend on Grammarly to be available. We use Amazon’s Application Load Balancers (ALBs) and CloudFront to surface our applications securely to the public internet, distribute the load across our services, and deliver our content with minimal delay.

AWS Shield and Web Application Firewall (WAF) provide protection against DDoS attacks and other kinds of exploitations. AWS Shield detects and automatically blocks attempted malicious connections on the network layer, while WAF helps us create custom rules (based on the OWASP Top 10 security risks) to protect against attacks on the application layer.

Detection

If something unexpected happens in our infrastructure, SecOps is responsible for knowing what happened, when it occurred, where the data or services were affected, and who carried out the attack. To achieve this, we use various continuous monitoring tools and undertake in-depth incident investigations when needed.

To fit these tools together, we’ve worked with additional vendors to build a centralized security information and event management (SIEM) system that handles alerts and visualizations. If you want to stay within the AWS ecosystem, Security Hub can also merge the reports from different tools (including some third parties).

Continuous monitoring

CloudTrail records all API calls made to an AWS account. We forward the logs to our SIEM; the most popular approach is to store these records in S3 buckets, but be aware that it’s critical to meet security requirements for this storage, such as data encryption and access restriction. As a best practice, we recommended configuring a dedicated AWS account for storing any logs used in a continuous monitoring process, and making the account possible to access by the SecOps team only.

GuardDuty analyzes CloudTrails, network activity, and S3 access logs to identify anomalies and suspicious actions. Another continuous monitoring tool we use is AWS Config, which helps detect non-compliant configurations in close to real time. Finally, AWS Inspector helps conduct regular vulnerability assessments of our EC2 instances and audit our host configurations based on industry standards.

Incident investigation

When we need to retrieve more in-depth information about a security incident, VPC flow logs are a major source of help. They collect the history of all network connections established in the virtual private network (VPC) used by an AWS account. VPC flow logs are very similar to NetFlow logs and generate gigabytes of data—any SecOps team working with these logs should be sure to analyze their infrastructure and risk vectors during the Identification stage to define the critical subnetworks where flow logs should be enabled.

Response

When security is threatened, SecOps immediately responds by blocking suspicious users or bad actors. For example, we might block an IP address that sent too many requests during a denial of service attack. If we saw that an account was potentially compromised, we would take further action by enabling mandatory two-factor authentication and sending a reset password link to the user.

Since the response is so application-specific, there aren’t as many AWS tools here; instead, AWS provides a rich API that we use to write scripts that automate our responses, like changing access permissions for users or groups, closing established sessions to instances or databases, and rotating access keys and credentials. When the API isn’t sufficient, we use AWS Lambda functions to perform predefined steps for incident mitigation.

Restore

If we discover that a service has been misconfigured or that data was corrupted, it’s important to be able to restore the affected infrastructure to a previous version. Data is always backed up and duplicated across our data centers, and we use the Infrastructure as Code approach to ensure that we can restore our cloud infrastructure easily. All our resources and dependencies are described in a file, and if something happens—like a downed service or deleted database—we can restore our infrastructure by plugging this file into the cloud using Terraform (one can also use AWS CloudFormation to achieve a similar result). AWS Systems Manager also helps us manage our resources at scale and apply config changes in bulk.

Staying vigilant

When using a cloud service provider like AWS, it’s important for SecOps to stay up-to-date on the latest vulnerabilities that the provider has identified. We use the AWS Security Bulletins to get overall alerts, and the Amazon Linux Security Center to find out about vulnerabilities specific to the AWS Linux AMI (on which 90% of our services run). The AWS Security Blog is also a good resource for knowledge sharing and updates about new security features and tools for AWS customers.

To provide a comprehensive security solution, we utilize a variety of third-party tools and assessments. and also run an in-house Security Champions program to keep security top of mind across Grammarly teams. But depending on your size, what AWS provides might be enough to run a successful SecOps process. We hope that this article has been a useful overview of these tools and how they help us make Grammarly’s product a secure and trustworthy writing assistant for our millions of daily active users. If you are passionate about security, we have a great team looking to fill some exciting roles—please don’t hesitate to get in touch!

You can also learn more about Grammarly security operations, policies, practices, and attestations here.