Platform Engineering: Complete Guide to Internal Developer Platforms
Platform Engineering is the discipline of designing and building Internal Developer Platforms (IDPs) that enable application developers to self-serve infrastructure and services. By providing golden paths, automation, and standardized workflows, platform engineering improves developer productivity, reduces cognitive load, and accelerates software delivery while maintaining security and operational excellence.
What is Platform Engineering?
Platform Engineering creates self-service capabilities with automated infrastructure operations, enabling development teams to manage their application lifecycle without requiring deep infrastructure knowledge.
Key Concepts
- Internal Developer Platform (IDP): Self-service layer between developers and infrastructure
- Golden Paths: Opinionated, supported ways to accomplish tasks
- Platform as a Product: Treating the platform as a product with internal customers
- Developer Experience (DevEx): Focus on removing friction from development workflows
- Self-Service: Developers provision resources without tickets
- Cognitive Load Reduction: Hiding complexity while maintaining control
Platform Engineering vs DevOps vs SRE
| Aspect | Platform Engineering | DevOps | SRE |
|---|---|---|---|
| Focus | Developer productivity | Culture & automation | Reliability & operations |
| Primary Goal | Self-service platform | Break down silos | Service reliability |
| Customers | Application developers | Entire organization | End users |
| Artifacts | IDP, golden paths | CI/CD pipelines | SLOs, error budgets |
| Ownership | Platform team | Shared | SRE team |
| Abstraction | High (hide complexity) | Medium | Low (close to infra) |
Platform Engineering Maturity Model
Level 1: Manual Operations- Infrastructure as Code- Basic automation- Documentation-driven
Level 2: Self-Service Basics- Portal/catalog- Template-based provisioning- Basic golden paths
Level 3: Platform as Product- Developer portal- Multiple golden paths- Feedback loops- Metrics tracking
Level 4: Advanced Platform- AI-assisted workflows- Policy as code- Advanced observability- Cost optimization
Level 5: Autonomous Platform- Self-healing- Predictive scaling- Autonomous compliance- Intelligent routingIDP Architecture
Reference Architecture
┌─────────────────────────────────────────────────────────────┐│ Developer Interface Layer ││ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Portal │ │ CLI │ │ IDE │ ││ │ (Backstage) │ │ (Platform) │ │ Plugins │ ││ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │└─────────┼──────────────────┼──────────────────┼─────────────┘ │ │ │┌─────────┴──────────────────┴──────────────────┴─────────────┐│ Service Catalog & Golden Paths ││ ││ ┌─────────────────────────────────────────────────────┐ ││ │ Software Templates (Scaffolding) │ ││ │ • Microservice starter │ ││ │ • Database provisioning │ ││ │ • CI/CD pipeline │ ││ └─────────────────────────────────────────────────────┘ │└───────────────────────────────┬─────────────────────────────┘ │┌───────────────────────────────┴─────────────────────────────┐│ Platform Orchestration Layer ││ ││ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ ││ │ Crossplane│ │ Terraform │ │ ArgoCD/Flux │ ││ │ (Control │ │ (IaC) │ │ (GitOps) │ ││ │ Plane) │ │ │ │ │ ││ └────────────┘ └────────────┘ └────────────────────┘ ││ ││ ┌────────────────────────────────────────────────────┐ ││ │ Policy & Security Layer │ ││ │ • OPA (Policy as Code) │ ││ │ • Vault (Secrets Management) │ ││ │ • Kyverno (K8s Policy) │ ││ └────────────────────────────────────────────────────┘ │└───────────────────────────────┬─────────────────────────────┘ │┌───────────────────────────────┴─────────────────────────────┐│ Infrastructure Layer ││ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Kubernetes │ │ Cloud │ │ Databases │ ││ │ Clusters │ │ (AWS/GCP) │ │ (RDS) │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ ││ ┌──────────────────────────────────────────────────────┐ ││ │ Observability Platform │ ││ │ • Metrics (Prometheus/Datadog) │ ││ │ • Logs (Loki/Elasticsearch) │ ││ │ • Traces (Tempo/Jaeger) │ ││ │ • Cost (Kubecost/CloudHealth) │ ││ └──────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────┘Building an IDP
1. Developer Portal (Backstage)
# Install Backstagenpx @backstage/create-app@latest
cd my-backstage-app
# Install pluginsyarn add --cwd packages/app @backstage/plugin-kubernetesyarn add --cwd packages/app @backstage/plugin-tech-radaryarn add --cwd packages/app @roadiehq/backstage-plugin-argo-cd
# Configure app-config.yamlcat > app-config.yaml << 'EOF'app: title: Platform Portal baseUrl: https://platform.example.com
organization: name: My Company
backend: baseUrl: https://platform.example.com listen: port: 7007 database: client: pg connection: host: postgres port: 5432 user: backstage password: ${POSTGRES_PASSWORD}
catalog: rules: - allow: [Component, System, API, Resource, Location] locations: - type: url target: https://github.com/example/platform-templates/blob/main/catalog-info.yaml
kubernetes: serviceLocatorMethod: type: 'multiTenant' clusterLocatorMethods: - type: 'config' clusters: - name: production url: https://k8s.prod.example.com authProvider: serviceAccount serviceAccountToken: ${K8S_TOKEN}
techdocs: builder: 'local' generator: runIn: 'local' publisher: type: 'local'
auth: environment: production providers: github: production: clientId: ${GITHUB_CLIENT_ID} clientSecret: ${GITHUB_CLIENT_SECRET}EOF
# Run development serveryarn dev2. Software Templates
# template.yaml - Microservice TemplateapiVersion: scaffolder.backstage.io/v1beta3kind: Templatemetadata: name: nodejs-microservice title: Node.js Microservice description: Create a new Node.js microservice with all best practices tags: - nodejs - microservice - recommendedspec: owner: platform-team type: service
parameters: - title: Service Information required: - name - description - owner properties: name: title: Name type: string description: Unique name of the service pattern: '^[a-z0-9-]+$' description: title: Description type: string description: What does this service do? owner: title: Owner type: string description: Team responsible for this service ui:field: OwnerPicker ui:options: catalogFilter: kind: Group
- title: Infrastructure required: - environment - database properties: environment: title: Environment type: string enum: - development - staging - production database: title: Database type: string enum: - postgresql - mysql - mongodb - none replicas: title: Replica Count type: integer default: 2 minimum: 1 maximum: 10
steps: - id: fetch-template name: Fetch Application Template action: fetch:template input: url: ./skeleton values: name: ${{ parameters.name }} description: ${{ parameters.description }} owner: ${{ parameters.owner }} database: ${{ parameters.database }}
- id: publish-github name: Publish to GitHub action: publish:github input: allowedHosts: ['github.com'] description: ${{ parameters.description }} repoUrl: github.com?owner=myorg&repo=${{ parameters.name }} defaultBranch: main
- id: create-argocd-app name: Create ArgoCD Application action: argocd:create-app input: name: ${{ parameters.name }} namespace: ${{ parameters.environment }} repoUrl: https://github.com/myorg/${{ parameters.name }} path: kubernetes
- id: provision-database name: Provision Database if: ${{ parameters.database !== 'none' }} action: crossplane:provision input: apiVersion: database.example.com/v1alpha1 kind: ${{ parameters.database }} metadata: name: ${{ parameters.name }}-db spec: size: small environment: ${{ parameters.environment }}
- id: register-catalog name: Register in Catalog action: catalog:register input: repoContentsUrl: ${{ steps['publish-github'].output.repoContentsUrl }} catalogInfoPath: '/catalog-info.yaml'
output: links: - title: Repository url: ${{ steps['publish-github'].output.remoteUrl }} - title: ArgoCD url: https://argocd.example.com/applications/${{ parameters.name }} - title: View in Catalog icon: catalog entityRef: ${{ steps['register-catalog'].output.entityRef }}3. Infrastructure as Code (Crossplane)
apiVersion: apiextensions.crossplane.io/v1kind: CompositeResourceDefinitionmetadata: name: xpostgresqlinstances.database.example.comspec: group: database.example.com names: kind: XPostgreSQLInstance plural: xpostgresqlinstances claimNames: kind: PostgreSQLInstance plural: postgresqlinstances versions: - name: v1alpha1 served: true referenceable: true schema: openAPIV3Schema: type: object properties: spec: type: object properties: parameters: type: object properties: size: type: string enum: - small - medium - large environment: type: string enum: - development - staging - production version: type: string default: "15" required: - size - environment required: - parameters---# composition.yamlapiVersion: apiextensions.crossplane.io/v1kind: Compositionmetadata: name: xpostgresqlinstances.aws.database.example.comspec: writeConnectionSecretsToNamespace: crossplane-system compositeTypeRef: apiVersion: database.example.com/v1alpha1 kind: XPostgreSQLInstance
resources: - name: rds-instance base: apiVersion: rds.aws.upbound.io/v1beta1 kind: Instance spec: forProvider: region: us-east-1 engine: postgres instanceClass: db.t3.micro allocatedStorage: 20 storageEncrypted: true publiclyAccessible: false skipFinalSnapshot: true patches: - type: FromCompositeFieldPath fromFieldPath: spec.parameters.size toFieldPath: spec.forProvider.instanceClass transforms: - type: map map: small: db.t3.micro medium: db.t3.medium large: db.m5.large - type: FromCompositeFieldPath fromFieldPath: spec.parameters.version toFieldPath: spec.forProvider.engineVersion - type: FromCompositeFieldPath fromFieldPath: spec.parameters.environment toFieldPath: spec.forProvider.tags.Environment
- name: security-group base: apiVersion: ec2.aws.upbound.io/v1beta1 kind: SecurityGroup spec: forProvider: region: us-east-1 description: PostgreSQL security group ingress: - fromPort: 5432 toPort: 5432 protocol: tcp cidrBlocks: - 10.0.0.0/84. GitOps (ArgoCD)
apiVersion: argoproj.io/v1alpha1kind: Applicationmetadata: name: platform-app namespace: argocdspec: project: default
source: repoURL: https://github.com/example/platform-apps targetRevision: HEAD path: apps/production
# Helm helm: valueFiles: - values.yaml - values-production.yaml
destination: server: https://kubernetes.default.svc namespace: production
syncPolicy: automated: prune: true selfHeal: true allowEmpty: false syncOptions: - CreateNamespace=true retry: limit: 5 backoff: duration: 5s factor: 2 maxDuration: 3m
ignoreDifferences: - group: apps kind: Deployment jsonPointers: - /spec/replicas---# ApplicationSet for multi-environmentapiVersion: argoproj.io/v1alpha1kind: ApplicationSetmetadata: name: microservices namespace: argocdspec: generators: - git: repoURL: https://github.com/example/platform-apps revision: HEAD directories: - path: apps/*
template: metadata: name: '{{path.basename}}' spec: project: default source: repoURL: https://github.com/example/platform-apps targetRevision: HEAD path: '{{path}}' destination: server: https://kubernetes.default.svc namespace: '{{path.basename}}' syncPolicy: automated: prune: true selfHeal: true5. Policy as Code (OPA/Kyverno)
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: platform-standardsspec: validationFailureAction: enforce background: true rules: # Require resource limits - name: require-resource-limits match: any: - resources: kinds: - Deployment - StatefulSet validate: message: "CPU and memory limits are required" pattern: spec: template: spec: containers: - resources: limits: memory: "?*" cpu: "?*"
# Require labels - name: require-labels match: any: - resources: kinds: - Deployment - Service validate: message: "Required labels: app, owner, environment" pattern: metadata: labels: app: "?*" owner: "?*" environment: "?*"
# Block latest tag - name: disallow-latest-tag match: any: - resources: kinds: - Pod validate: message: "Using 'latest' tag is not allowed" pattern: spec: containers: - image: "!*:latest"---# OPA PolicyapiVersion: constraints.gatekeeper.sh/v1beta1kind: K8sRequiredLabelsmetadata: name: require-platform-labelsspec: match: kinds: - apiGroups: ["apps"] kinds: ["Deployment", "StatefulSet"] parameters: labels: - key: "app.kubernetes.io/name" - key: "app.kubernetes.io/managed-by" - key: "platform.example.com/owner" - key: "platform.example.com/cost-center"Golden Paths
Example: Deploy a New Microservice
# Developer experience - Single commandplatform create microservice \ --name user-service \ --language nodejs \ --database postgresql \ --environment production
# Behind the scenes:# 1. Create Git repository# 2. Scaffold application code# 3. Create CI/CD pipeline# 4. Provision database# 5. Create Kubernetes manifests# 6. Configure monitoring# 7. Set up logging# 8. Register in service catalog# 9. Create ArgoCD application# 10. Deploy to Kubernetes
# Everything ready in 5 minutes vs 2 daysExample Golden Path Implementation
apiVersion: platform.example.com/v1kind: GoldenPathmetadata: name: microservice-deploymentspec: description: "Standard path for deploying microservices"
steps: - name: code-repository type: github template: microservice-template
- name: ci-pipeline type: github-actions workflow: | name: CI on: [push] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Build and Test run: | npm ci npm test npm run build - name: Build Container run: docker build -t app:${{ github.sha }} . - name: Push to Registry run: docker push registry.example.com/app:${{ github.sha }}
- name: database type: crossplane composition: postgresql-instance
- name: kubernetes-deployment type: argocd sync-policy: automated
- name: observability type: integration services: - prometheus - grafana - jaeger
- name: documentation type: backstage-techdocs auto-generate: truePlatform Metrics
Key Performance Indicators
# Platform KPIsdeveloper_productivity: - metric: deployment_frequency target: "10+ per day" current: "8.5 per day"
- metric: lead_time_for_changes target: "< 1 hour" current: "45 minutes"
- metric: mean_time_to_recovery target: "< 15 minutes" current: "12 minutes"
- metric: change_failure_rate target: "< 5%" current: "3.2%"
platform_adoption: - metric: services_using_golden_paths target: "90%" current: "78%"
- metric: self_service_adoption target: "80%" current: "72%"
- metric: platform_nps target: "> 50" current: "58"
operational_efficiency: - metric: infrastructure_tickets target: "< 10 per week" current: "6 per week"
- metric: onboarding_time target: "< 1 day" current: "4 hours"
- metric: platform_uptime target: "99.9%" current: "99.95%"Platform Team Structure
Platform Team Organization:
Product Management├── Platform Product Manager│ ├── Roadmap planning│ ├── User research│ └── Metrics analysis
Engineering├── Platform Engineers (4-6)│ ├── Infrastructure automation│ ├── Golden paths development│ └── Integration work│├── Developer Experience Engineers (2-3)│ ├── Portal development│ ├── CLI tools│ └── Documentation│└── SRE (2-3) ├── Platform reliability ├── Performance optimization └── Incident response
Developer Advocacy└── Platform Evangelists (1-2) ├── Training ├── Documentation └── Community buildingBest Practices
1. Treat Platform as a Product
- Have a product manager- Collect user feedback regularly- Maintain a public roadmap- Measure satisfaction (NPS)- Iterate based on data- Provide excellent documentation2. Start Small, Iterate
Phase 1: Core Services (3 months)- Basic developer portal- 1-2 golden paths- Essential integrations
Phase 2: Expansion (6 months)- More golden paths- Advanced automation- Self-service capabilities
Phase 3: Optimization (12 months)- AI/ML integration- Advanced observability- Cost optimization3. Developer Experience First
# Good Platform Designtime_to_first_deployment: without_platform: "2-3 days" with_platform: "15 minutes"
cognitive_load: decisions_required: without: 50+ with: 5
documentation: - Interactive tutorials - Video walkthroughs - Runnable examples - Auto-generated from codeConclusion
Platform Engineering transforms how organizations deliver software by creating self-service capabilities that empower developers while maintaining operational excellence. By treating the platform as a product, providing golden paths, and focusing relentlessly on developer experience, platform teams enable their organizations to move faster, more safely, and more efficiently.
Master Platform Engineering and build world-class Internal Developer Platforms with our training programs. Contact us for platform engineering consulting and training.