American Express (AMEX) stands at the forefront of global payments, handling trillions of dollars in transactions annually. This immense volume translates to millions of daily transactions, each requiring near-instantaneous processing to meet customer expectations for speed and reliability.
The Need for a Modern Payment Infrastructure
AMEX’s legacy payment system, built on traditional on-premise infrastructure, struggled to keep pace with the evolving demands of digital commerce. The system’s limitations included:
- Inflexibility in scaling to meet surges in transaction volume
- Difficulty integrating new payment technologies and regulatory requirements
- Challenges in maintaining low-latency responses essential for seamless customer experiences
Recognizing these constraints, AMEX undertook a comprehensive overhaul of its payment network in 2018, aiming to deliver a platform that was cloud-ready, adaptable, secure, and capable of processing payments in milliseconds.
Key Drivers for System Transformation
- Cloud Scalability: The new architecture was designed to leverage cloud computing, enabling rapid scaling and improved resilience.
- Agility: The system supports faster integration of new technologies and regulatory changes, keeping pace with the dynamic financial landscape.
- Security and Reliability: Enhanced mechanisms ensure secure, uninterrupted transaction processing.
- Ultra-Low Latency: The platform is engineered to approve or decline transactions within milliseconds, minimizing customer wait times.
- Capacity for Growth: The infrastructure can handle increasing transaction volumes without performance degradation.
The Global Transaction Router: Core of the New System
At the heart of AMEX’s modern payment network is the Global Transaction Router (GTR). This component orchestrates the flow of payment requests among key entities:
- Acquirers: Merchant banks that initiate payment requests on behalf of merchants.
- Processors: Service providers that manage the technical exchange of payment data.
- Issuers: Banks that issue AMEX cards and authorize transactions.
The GTR acts as the initial point of contact, efficiently routing each transaction through the necessary verification and approval steps before final settlement.
Unique Engineering Challenges
Building the GTR required overcoming several technical hurdles:
- Persistent TCP Sessions: Unlike modern web APIs, payment systems often use the ISO 8583 protocol, which relies on long-lived TCP connections.
- Legacy Protocols: ISO 8583, while widely adopted, presents challenges due to its age and complexity.
- Traffic Volatility: The system must absorb sudden spikes in transaction volume, such as during major shopping events.
- Stringent Latency Requirements: Even minor delays can result in failed transactions or poor user experiences.
Strategic Technical Decisions
Go (Golang) for Concurrency
AMEX engineers selected Go as the primary programming language for the GTR. Go’s lightweight concurrency model, powered by goroutines, allows the system to manage thousands of simultaneous connections efficiently. Its ahead-of-time compilation and optimized garbage collector further reduce latency, ensuring rapid transaction processing.
gRPC over HTTP/2 for Internal Communication
To accelerate internal data exchange, the team implemented gRPC over HTTP/2. This approach uses Protocol Buffers for compact, fast message serialization, and supports multiplexing—enabling multiple requests to be processed concurrently over a single connection.
Asynchronous Logging
Traditional synchronous logging can bottleneck high-speed systems. AMEX adopted asynchronous logging, buffering log entries in memory and writing them in batches. This minimizes performance impact and ensures transaction processing remains uninterrupted, even under heavy load.
Optimization Strategies for Peak Performance
Profiling and Benchmarking
Continuous profiling with Go’s pprof tool helps identify and resolve performance bottlenecks. Benchmarking under simulated high-traffic conditions ensures the system maintains low latency and high throughput, even during peak periods.
Reader-Writer Mutexes
To manage concurrent access to shared resources, the team implemented reader-writer mutexes. This allows multiple read operations to occur simultaneously, only restricting access during write operations, thus reducing unnecessary delays.
Direct Socket Communication
Initially, Go channels were used for inter-process communication, but this introduced latency. By eliminating unnecessary channel usage and processing transactions directly from TCP to gRPC, the team streamlined data flow and reduced overhead.
Operational Best Practices
Continuous Performance Testing
Every code change, regardless of size, undergoes rigorous performance testing. This proactive approach ensures that updates do not inadvertently introduce latency or scalability issues.
Chaos Testing
To guarantee resilience, AMEX regularly conducts chaos testing—deliberately introducing failures to observe system recovery and maintain uninterrupted service.
Iterative Development
Rather than deploying large, infrequent updates, the engineering team adopts an incremental approach. Frequent, small enhancements allow for continuous improvement in performance, security, and scalability.
Conclusion
The transformation of American Express’s payment infrastructure exemplifies how thoughtful engineering and modern technology can deliver a payment system that is both highly scalable and ultra-reliable. By leveraging Go for concurrency, gRPC for efficient communication, and a suite of optimization strategies, AMEX ensures that millions of transactions are processed every day with millisecond latency—meeting the demands of today’s digital economy and setting a benchmark for the industry.
Read more such articles from our Newsletter here.