0.0 — DATA ENGINEER
Sam·Malik
Building data infrastructure that's honest about what it measures.
I'm a data engineer finishing my B.S. in Informatics (Data Science) at the University of Washington. Day to day I build and run ETL/ELT pipelines on Azure with Data Factory, Databricks, and Synapse, write the data-quality and governance checks that keep them trustworthy, and ship agentic RAG systems on the Claude and OpenAI APIs. Python and SQL do most of the work. Mostly the job is taking data that's scattered across a lot of systems and turning it into something teams can actually rely on.
0.1 — HOW I THINK ABOUT THE WORK
Data, technology, and culture are one loop.
I think of data, technology, and culture as one recursive feedback loop, each one constantly rewriting the next. Data engineering is how you step into that loop on purpose.
FIG. 0.1 — DATA → TECHNOLOGY → CULTURE → DATA
DATA → TECHNOLOGY. The data we collect ends up dictating the tools we build to handle it. Around 3400 BCE, Mesopotamian scribes were pressing grain tallies and labor records into clay, and the sheer volume of that accounting is what pushed them to invent cuneiform in the first place. The dataset created the technology.
TECHNOLOGY → CULTURE. Those tools then change how people live and what they treat as real. Double-entry bookkeeping in the 15th century was a data technology, and it helped make modern capitalism thinkable in the first place. The census, from Rome to the Domesday Book to today, turned populations into records, and in doing so it reshaped what it even meant to be a citizen of a state.
CULTURE → DATA. And culture decides what's worth recording at all, which closes the loop and changes what data exists in the first place. A society that decides something matters starts measuring it. A recommendation system turns your behavior into data, trains on it, and nudges the behavior it was measuring. Same loop the scribes started, just running faster.
None of this is new. Datafication is one of the constants of recorded human history. What's new is the speed and the stakes. Building data infrastructure means stepping into that loop, deciding what becomes legible and making sure the record stays honest about what it's actually measuring. That's the part I care about most.
Track record
Experience
Education, roles, tools, and measurable impact. →
0.4Built things
Projects
Pipelines, databases, models, and what I learned building them. →
0.5Notebook
Writing
Short essays on data systems, feedback, and governance. →
0.6The shelf
Reading
Last, current, next, and what I made of each. →
0.7Signal
Connect
Email, LinkedIn, GitHub. →