Skip to content
Vladimir Chavkov
Go back

Platform Engineering: Complete Guide to Internal Developer Platforms

Edit page

Platform Engineering: Complete Guide to Internal Developer Platforms

Platform Engineering is the discipline of designing and building Internal Developer Platforms (IDPs) that enable application developers to self-serve infrastructure and services. By providing golden paths, automation, and standardized workflows, platform engineering improves developer productivity, reduces cognitive load, and accelerates software delivery while maintaining security and operational excellence.

What is Platform Engineering?

Platform Engineering creates self-service capabilities with automated infrastructure operations, enabling development teams to manage their application lifecycle without requiring deep infrastructure knowledge.

Key Concepts

  1. Internal Developer Platform (IDP): Self-service layer between developers and infrastructure
  2. Golden Paths: Opinionated, supported ways to accomplish tasks
  3. Platform as a Product: Treating the platform as a product with internal customers
  4. Developer Experience (DevEx): Focus on removing friction from development workflows
  5. Self-Service: Developers provision resources without tickets
  6. Cognitive Load Reduction: Hiding complexity while maintaining control

Platform Engineering vs DevOps vs SRE

AspectPlatform EngineeringDevOpsSRE
FocusDeveloper productivityCulture & automationReliability & operations
Primary GoalSelf-service platformBreak down silosService reliability
CustomersApplication developersEntire organizationEnd users
ArtifactsIDP, golden pathsCI/CD pipelinesSLOs, error budgets
OwnershipPlatform teamSharedSRE team
AbstractionHigh (hide complexity)MediumLow (close to infra)

Platform Engineering Maturity Model

Level 1: Manual Operations
- Infrastructure as Code
- Basic automation
- Documentation-driven
Level 2: Self-Service Basics
- Portal/catalog
- Template-based provisioning
- Basic golden paths
Level 3: Platform as Product
- Developer portal
- Multiple golden paths
- Feedback loops
- Metrics tracking
Level 4: Advanced Platform
- AI-assisted workflows
- Policy as code
- Advanced observability
- Cost optimization
Level 5: Autonomous Platform
- Self-healing
- Predictive scaling
- Autonomous compliance
- Intelligent routing

IDP Architecture

Reference Architecture

┌─────────────────────────────────────────────────────────────┐
│ Developer Interface Layer │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Portal │ │ CLI │ │ IDE │ │
│ │ (Backstage) │ │ (Platform) │ │ Plugins │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────┼──────────────────┼─────────────┘
│ │ │
┌─────────┴──────────────────┴──────────────────┴─────────────┐
│ Service Catalog & Golden Paths │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Software Templates (Scaffolding) │ │
│ │ • Microservice starter │ │
│ │ • Database provisioning │ │
│ │ • CI/CD pipeline │ │
│ └─────────────────────────────────────────────────────┘ │
└───────────────────────────────┬─────────────────────────────┘
┌───────────────────────────────┴─────────────────────────────┐
│ Platform Orchestration Layer │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │
│ │ Crossplane│ │ Terraform │ │ ArgoCD/Flux │ │
│ │ (Control │ │ (IaC) │ │ (GitOps) │ │
│ │ Plane) │ │ │ │ │ │
│ └────────────┘ └────────────┘ └────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Policy & Security Layer │ │
│ │ • OPA (Policy as Code) │ │
│ │ • Vault (Secrets Management) │ │
│ │ • Kyverno (K8s Policy) │ │
│ └────────────────────────────────────────────────────┘ │
└───────────────────────────────┬─────────────────────────────┘
┌───────────────────────────────┴─────────────────────────────┐
│ Infrastructure Layer │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Kubernetes │ │ Cloud │ │ Databases │ │
│ │ Clusters │ │ (AWS/GCP) │ │ (RDS) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Observability Platform │ │
│ │ • Metrics (Prometheus/Datadog) │ │
│ │ • Logs (Loki/Elasticsearch) │ │
│ │ • Traces (Tempo/Jaeger) │ │
│ │ • Cost (Kubecost/CloudHealth) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Building an IDP

1. Developer Portal (Backstage)

Terminal window
# Install Backstage
npx @backstage/create-app@latest
cd my-backstage-app
# Install plugins
yarn add --cwd packages/app @backstage/plugin-kubernetes
yarn add --cwd packages/app @backstage/plugin-tech-radar
yarn add --cwd packages/app @roadiehq/backstage-plugin-argo-cd
# Configure app-config.yaml
cat > app-config.yaml << 'EOF'
app:
title: Platform Portal
baseUrl: https://platform.example.com
organization:
name: My Company
backend:
baseUrl: https://platform.example.com
listen:
port: 7007
database:
client: pg
connection:
host: postgres
port: 5432
user: backstage
password: ${POSTGRES_PASSWORD}
catalog:
rules:
- allow: [Component, System, API, Resource, Location]
locations:
- type: url
target: https://github.com/example/platform-templates/blob/main/catalog-info.yaml
kubernetes:
serviceLocatorMethod:
type: 'multiTenant'
clusterLocatorMethods:
- type: 'config'
clusters:
- name: production
url: https://k8s.prod.example.com
authProvider: serviceAccount
serviceAccountToken: ${K8S_TOKEN}
techdocs:
builder: 'local'
generator:
runIn: 'local'
publisher:
type: 'local'
auth:
environment: production
providers:
github:
production:
clientId: ${GITHUB_CLIENT_ID}
clientSecret: ${GITHUB_CLIENT_SECRET}
EOF
# Run development server
yarn dev

2. Software Templates

# template.yaml - Microservice Template
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: nodejs-microservice
title: Node.js Microservice
description: Create a new Node.js microservice with all best practices
tags:
- nodejs
- microservice
- recommended
spec:
owner: platform-team
type: service
parameters:
- title: Service Information
required:
- name
- description
- owner
properties:
name:
title: Name
type: string
description: Unique name of the service
pattern: '^[a-z0-9-]+$'
description:
title: Description
type: string
description: What does this service do?
owner:
title: Owner
type: string
description: Team responsible for this service
ui:field: OwnerPicker
ui:options:
catalogFilter:
kind: Group
- title: Infrastructure
required:
- environment
- database
properties:
environment:
title: Environment
type: string
enum:
- development
- staging
- production
database:
title: Database
type: string
enum:
- postgresql
- mysql
- mongodb
- none
replicas:
title: Replica Count
type: integer
default: 2
minimum: 1
maximum: 10
steps:
- id: fetch-template
name: Fetch Application Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
description: ${{ parameters.description }}
owner: ${{ parameters.owner }}
database: ${{ parameters.database }}
- id: publish-github
name: Publish to GitHub
action: publish:github
input:
allowedHosts: ['github.com']
description: ${{ parameters.description }}
repoUrl: github.com?owner=myorg&repo=${{ parameters.name }}
defaultBranch: main
- id: create-argocd-app
name: Create ArgoCD Application
action: argocd:create-app
input:
name: ${{ parameters.name }}
namespace: ${{ parameters.environment }}
repoUrl: https://github.com/myorg/${{ parameters.name }}
path: kubernetes
- id: provision-database
name: Provision Database
if: ${{ parameters.database !== 'none' }}
action: crossplane:provision
input:
apiVersion: database.example.com/v1alpha1
kind: ${{ parameters.database }}
metadata:
name: ${{ parameters.name }}-db
spec:
size: small
environment: ${{ parameters.environment }}
- id: register-catalog
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['publish-github'].output.repoContentsUrl }}
catalogInfoPath: '/catalog-info.yaml'
output:
links:
- title: Repository
url: ${{ steps['publish-github'].output.remoteUrl }}
- title: ArgoCD
url: https://argocd.example.com/applications/${{ parameters.name }}
- title: View in Catalog
icon: catalog
entityRef: ${{ steps['register-catalog'].output.entityRef }}

3. Infrastructure as Code (Crossplane)

composite-resource-definition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xpostgresqlinstances.database.example.com
spec:
group: database.example.com
names:
kind: XPostgreSQLInstance
plural: xpostgresqlinstances
claimNames:
kind: PostgreSQLInstance
plural: postgresqlinstances
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
parameters:
type: object
properties:
size:
type: string
enum:
- small
- medium
- large
environment:
type: string
enum:
- development
- staging
- production
version:
type: string
default: "15"
required:
- size
- environment
required:
- parameters
---
# composition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: xpostgresqlinstances.aws.database.example.com
spec:
writeConnectionSecretsToNamespace: crossplane-system
compositeTypeRef:
apiVersion: database.example.com/v1alpha1
kind: XPostgreSQLInstance
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
region: us-east-1
engine: postgres
instanceClass: db.t3.micro
allocatedStorage: 20
storageEncrypted: true
publiclyAccessible: false
skipFinalSnapshot: true
patches:
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.size
toFieldPath: spec.forProvider.instanceClass
transforms:
- type: map
map:
small: db.t3.micro
medium: db.t3.medium
large: db.m5.large
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.version
toFieldPath: spec.forProvider.engineVersion
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.environment
toFieldPath: spec.forProvider.tags.Environment
- name: security-group
base:
apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroup
spec:
forProvider:
region: us-east-1
description: PostgreSQL security group
ingress:
- fromPort: 5432
toPort: 5432
protocol: tcp
cidrBlocks:
- 10.0.0.0/8

4. GitOps (ArgoCD)

argocd-application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: platform-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/example/platform-apps
targetRevision: HEAD
path: apps/production
# Helm
helm:
valueFiles:
- values.yaml
- values-production.yaml
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas
---
# ApplicationSet for multi-environment
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: microservices
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/example/platform-apps
revision: HEAD
directories:
- path: apps/*
template:
metadata:
name: '{{path.basename}}'
spec:
project: default
source:
repoURL: https://github.com/example/platform-apps
targetRevision: HEAD
path: '{{path}}'
destination:
server: https://kubernetes.default.svc
namespace: '{{path.basename}}'
syncPolicy:
automated:
prune: true
selfHeal: true

5. Policy as Code (OPA/Kyverno)

kyverno-policy.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: platform-standards
spec:
validationFailureAction: enforce
background: true
rules:
# Require resource limits
- name: require-resource-limits
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
validate:
message: "CPU and memory limits are required"
pattern:
spec:
template:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
# Require labels
- name: require-labels
match:
any:
- resources:
kinds:
- Deployment
- Service
validate:
message: "Required labels: app, owner, environment"
pattern:
metadata:
labels:
app: "?*"
owner: "?*"
environment: "?*"
# Block latest tag
- name: disallow-latest-tag
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Using 'latest' tag is not allowed"
pattern:
spec:
containers:
- image: "!*:latest"
---
# OPA Policy
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-platform-labels
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment", "StatefulSet"]
parameters:
labels:
- key: "app.kubernetes.io/name"
- key: "app.kubernetes.io/managed-by"
- key: "platform.example.com/owner"
- key: "platform.example.com/cost-center"

Golden Paths

Example: Deploy a New Microservice

Terminal window
# Developer experience - Single command
platform create microservice \
--name user-service \
--language nodejs \
--database postgresql \
--environment production
# Behind the scenes:
# 1. Create Git repository
# 2. Scaffold application code
# 3. Create CI/CD pipeline
# 4. Provision database
# 5. Create Kubernetes manifests
# 6. Configure monitoring
# 7. Set up logging
# 8. Register in service catalog
# 9. Create ArgoCD application
# 10. Deploy to Kubernetes
# Everything ready in 5 minutes vs 2 days

Example Golden Path Implementation

golden-path.yaml
apiVersion: platform.example.com/v1
kind: GoldenPath
metadata:
name: microservice-deployment
spec:
description: "Standard path for deploying microservices"
steps:
- name: code-repository
type: github
template: microservice-template
- name: ci-pipeline
type: github-actions
workflow: |
name: CI
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build and Test
run: |
npm ci
npm test
npm run build
- name: Build Container
run: docker build -t app:${{ github.sha }} .
- name: Push to Registry
run: docker push registry.example.com/app:${{ github.sha }}
- name: database
type: crossplane
composition: postgresql-instance
- name: kubernetes-deployment
type: argocd
sync-policy: automated
- name: observability
type: integration
services:
- prometheus
- grafana
- jaeger
- name: documentation
type: backstage-techdocs
auto-generate: true

Platform Metrics

Key Performance Indicators

# Platform KPIs
developer_productivity:
- metric: deployment_frequency
target: "10+ per day"
current: "8.5 per day"
- metric: lead_time_for_changes
target: "< 1 hour"
current: "45 minutes"
- metric: mean_time_to_recovery
target: "< 15 minutes"
current: "12 minutes"
- metric: change_failure_rate
target: "< 5%"
current: "3.2%"
platform_adoption:
- metric: services_using_golden_paths
target: "90%"
current: "78%"
- metric: self_service_adoption
target: "80%"
current: "72%"
- metric: platform_nps
target: "> 50"
current: "58"
operational_efficiency:
- metric: infrastructure_tickets
target: "< 10 per week"
current: "6 per week"
- metric: onboarding_time
target: "< 1 day"
current: "4 hours"
- metric: platform_uptime
target: "99.9%"
current: "99.95%"

Platform Team Structure

Platform Team Organization:
Product Management
├── Platform Product Manager
│ ├── Roadmap planning
│ ├── User research
│ └── Metrics analysis
Engineering
├── Platform Engineers (4-6)
│ ├── Infrastructure automation
│ ├── Golden paths development
│ └── Integration work
├── Developer Experience Engineers (2-3)
│ ├── Portal development
│ ├── CLI tools
│ └── Documentation
└── SRE (2-3)
├── Platform reliability
├── Performance optimization
└── Incident response
Developer Advocacy
└── Platform Evangelists (1-2)
├── Training
├── Documentation
└── Community building

Best Practices

1. Treat Platform as a Product

- Have a product manager
- Collect user feedback regularly
- Maintain a public roadmap
- Measure satisfaction (NPS)
- Iterate based on data
- Provide excellent documentation

2. Start Small, Iterate

Phase 1: Core Services (3 months)
- Basic developer portal
- 1-2 golden paths
- Essential integrations
Phase 2: Expansion (6 months)
- More golden paths
- Advanced automation
- Self-service capabilities
Phase 3: Optimization (12 months)
- AI/ML integration
- Advanced observability
- Cost optimization

3. Developer Experience First

# Good Platform Design
time_to_first_deployment:
without_platform: "2-3 days"
with_platform: "15 minutes"
cognitive_load:
decisions_required:
without: 50+
with: 5
documentation:
- Interactive tutorials
- Video walkthroughs
- Runnable examples
- Auto-generated from code

Conclusion

Platform Engineering transforms how organizations deliver software by creating self-service capabilities that empower developers while maintaining operational excellence. By treating the platform as a product, providing golden paths, and focusing relentlessly on developer experience, platform teams enable their organizations to move faster, more safely, and more efficiently.


Master Platform Engineering and build world-class Internal Developer Platforms with our training programs. Contact us for platform engineering consulting and training.


Edit page
Share this post on:

Previous Post
LocalStack: Complete AWS Local Development and Testing Guide
Next Post
VMware to Proxmox Migration: Complete Transition Guide