Prompt Engineering · LLM Evals · RAG Systems

Sarah
Al-Said.

I build production-grade AI systems — reliable, observable, and built to scale. Four years of software engineering across financial services and complex domains, combined with an accelerated CS degree, means I ship AI that's fast to deploy and hard to break.

Stack Python · LangChain · RAG · LLMOps
Location United States · Remote
Available Full-time
LangChain RAG Pipelines LangGraph OpenAI API Hugging Face AWS Bedrock Pinecone pgvector FastAPI Docker Kubernetes MLflow RAGAS Python LLM Evals GitHub Actions LangChain RAG Pipelines LangGraph OpenAI API Hugging Face AWS Bedrock Pinecone pgvector FastAPI Docker Kubernetes MLflow RAGAS Python LLM Evals GitHub Actions
// About me

Turning ideas into
reliable AI.

~/sarah — zsh
cat profile.json
 
{
  "name": "Sarah Al-Said",
  "role": "AI Engineer",
  "location": "United States",
  "stack": ["Python", "LLMs", "AWS"],
  "education": "BS+MS CS (accelerated)",
  "open_to": "great teams & hard problems",
  "status": "building...",
}
 

I'm a software developer and AI specialist with four years of engineering experience and a background that makes me better at the job. I'm completing an accelerated BS + MS in Computer Science with a focus on AI and LLM systems.

I started in operational management — owning complex workflows, working within compliance frameworks, making decisions where accuracy mattered. Then I became an Application Developer at TD Bank, shipping software to millions of users. Each step added something the previous one couldn't.

Now I build AI systems that are reliable by design: eval-first, guardrails from day one, observability as a first-class concern.

01 Eval-First If you can't measure it, you can't ship it safely.
02 Production Over Demo PoC to prod is where most AI dies. I've made that journey.
03 Systems Thinker AI isn't a feature — it's infrastructure.
04 Nonlinear Path Operations → Engineering → AI. Each pivot compounded.
// Code showcase

How I write AI.

ContextRAG Hybrid retrieval pipeline with reranking + hallucination guardrail
 1  from langchain.vectorstores import Pinecone
 2  from langchain.retrievers import BM25Retriever, EnsembleRetriever
 3  from langchain.chat_models import ChatOpenAI
 4  from langchain.chains import RetrievalQAWithSourcesChain
 5  from .reranker import CohereReranker
 6  from .guardrails import HallucinationGuard
 7  
 8  
 9  def build_rag_pipeline(
10      index_name: str,
11      docs: list[str],
12      model: str = "gpt-4o",
13  ) -> RetrievalQAWithSourcesChain:
14      # Dense retriever via Pinecone embeddings
15      dense = Pinecone.from_existing_index(
16          index_name=index_name,
17          embedding=OpenAIEmbeddings(),
18      ).as_retriever(search_kwargs={"k": 20})
19  
20      # Sparse retriever for keyword precision
21      sparse = BM25Retriever.from_texts(docs)
22      sparse.k = 20
23  
24      # Hybrid: 60% dense, 40% sparse
25      hybrid = EnsembleRetriever(
26          retrievers=[dense, sparse],
27          weights=[0.6, 0.4],
28      )
29  
30      # Rerank top-20 down to top-5 for context window
31      reranked = CohereReranker(base_retriever=hybrid, top_n=5)
32  
33      llm = ChatOpenAI(model=model, temperature=0)
34  
35      chain = RetrievalQAWithSourcesChain.from_chain_type(
36          llm=llm,
37          retriever=reranked,
38          return_source_documents=True,
39      )
40  
41      # Wrap with hallucination guard before returning
42      return HallucinationGuard(chain=chain, threshold=0.85)
SafeLayer Real-time PII redaction middleware with risk scoring
 1  import re
 2  from presidio_analyzer import AnalyzerEngine
 3  from presidio_anonymizer import AnonymizerEngine
 4  from fastapi import Request, Response
 5  from .telemetry import risk_counter, latency_hist
 6  
 7  
 8  analyzer   = AnalyzerEngine()
 9  anonymizer = AnonymizerEngine()
10  
11  PII_ENTITIES = [
12      "PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
13      "US_SSN", "CREDIT_CARD", "MEDICAL_LICENSE",
14  ]
15  
16  
17  async def guardrail_middleware(
18      request: Request,
19      call_next,
20  ) -> Response:
21      body = await request.json()
22      prompt = body.get("prompt", "")
23  
24      # Detect PII in incoming prompt
25      results = analyzer.analyze(
26          text=prompt,
27          entities=PII_ENTITIES,
28          language="en",
29      )
30  
31      if results:
32          # Track risk event in OpenTelemetry
33          risk_counter.add(1, {"entity_types": [r.entity_type for r in results]})
34          # Anonymize before forwarding to LLM
35          clean = anonymizer.anonymize(text=prompt, analyzer_results=results)
36          body["prompt"] = clean.text
37  
38      # Forward sanitised request downstream
39      with latency_hist.time():
40          response = await call_next(request)
41  
42      return response
AgentOS LangGraph agentic workflow with human-in-the-loop checkpoint
 1  from langgraph.graph import StateGraph, END
 2  from langgraph.checkpoint import MemorySaver
 3  from langchain_openai import ChatOpenAI
 4  from .nodes import triage, route_task, execute, audit_log
 5  from .state import WorkflowState
 6  
 7  
 8  def build_agent_graph() -> StateGraph:
 9      graph = StateGraph(WorkflowState)
10  
11      # Register nodes
12      graph.add_node("triage",     triage)
13      graph.add_node("route",      route_task)
14      graph.add_node("execute",    execute)
15      graph.add_node("audit",      audit_log)
16  
17      # Human-in-the-loop: pause before execute for sensitive tasks
18      graph.add_edge("triage", "route")
19      graph.add_conditional_edges(
20          "route",
21          lambda s: "human_review" if s["risk_level"] > 0.7 else "execute",
22          {"human_review": END, "execute": "execute"},
23      )
24      graph.add_edge("execute", "audit")
25      graph.add_edge("audit", END)
26  
27      # Persist state between turns for long-running workflows
28      memory = MemorySaver()
29      return graph.compile(checkpointer=memory)
30  
31  
32  # Usage
33  app = build_agent_graph()
34  result = app.invoke(
35      {"task": "schedule_followup", "patient_id": "P-8821"},
36      config={"configurable": {"thread_id": "wf-001"}},
37  )
// Stack

What I work with.

LLMs & Orchestration
LangChain LlamaIndex LangGraph OpenAI API Hugging Face AWS Bedrock RAG Pipelines Prompt Engineering
Vector Databases
Pinecone FAISS Weaviate pgvector Chroma Qdrant
Cloud & DevOps
AWS SageMaker AWS Lambda Docker Kubernetes GitHub Actions Terraform CI/CD
Languages
Python FastAPI SQL JavaScript TypeScript Bash REST APIs
Data & ML
Embeddings Fine-Tuning Feature Pipelines PostgreSQL Redis Kafka
LLMOps
MLflow RAGAS Prometheus OpenTelemetry LLM Evals Guardrails PII Redaction
// Selected work

Three things I've built.

01
ContextRAG
Modular RAG framework for any document corpus — contracts, wikis, support tickets, codebases. Hybrid BM25 + dense retrieval with reranking delivers 55% improvement in answer relevance over naive RAG. Pluggable LLM backend with citation tracing, structured output enforcement, and hallucination guardrails built in from day one.
Python LangChain Pinecone FastAPI AWS Bedrock
02
EvalKit
Open-source LLMOps eval harness for prompt versioning and regression testing. Offline evals, online A/B experiments, and hallucination scoring via RAGAS and DeepEval. CI/CD gate blocks production deploys when faithfulness, coherence, or latency budgets regress — prompts treated like code, with the same quality bar.
Python RAGAS DeepEval MLflow GitHub Actions
LLM Evals GitHub ↗
03
SafeLayer
Drop-in LLM safety proxy — real-time PII redaction, toxicity filtering, output schema validation, and cost tracking across all LLM I/O. 200+ adversarial red-team prompt templates across 12 risk categories. Configurable per-tenant policy engine with OpenTelemetry dashboard surfacing latency, token spend, and risk-score heatmaps.
Python FastAPI Presidio OpenTelemetry Docker
AI Safety GitHub ↗
// Experience

Where I've worked.

Oct 2023 – Present
Software Engineer
Nova Rise Marketing · Remote
Building and maintaining software systems in a fast-moving marketing technology environment. Developing APIs, automation tooling, and data pipelines while deepening specialization in AI and LLM integration — applying prompt engineering, RAG patterns, and LLM-powered features directly to production products.
Software Engineering LLM Integration APIs Python Remote
Nov 2022 – Sep 2023
Software Engineer
TD Bank
Built production applications in a tightly regulated financial environment. Developed REST APIs, owned CI/CD pipelines, collaborated with security teams on data handling controls, and shipped features to millions of users. The discipline of regulated-industry engineering — auditability, reliability, zero-tolerance for data bugs — carries into every AI system I build today.
REST APIs CI/CD Financial Services Security Compliance
May 2018 – Jan 2022
Administrative Supervisor
HCA Healthcare
Nearly four years managing healthcare operations at one of the largest hospital systems in the US. Owned complex clinical workflows, worked within strict HIPAA compliance frameworks, and made high-accuracy decisions where errors had real consequences. That background shapes how I build AI: reliability and compliance aren't afterthoughts — they're the design.
Healthcare Operations HIPAA Clinical Workflows HCA Healthcare
// Contact

Let's build something
great.

Open to AI Engineering roles across any industry. If you're building production LLM systems that need to be reliable, scalable, and genuinely useful — I'd love to talk.

Message sent! I'll get back to you within 24 hours.
Please enter your name
Please enter a valid email
Please select a topic
Please enter a message 0 / 500