0xHabib
HomePostsVisualizationsCheatsheetsNotesStudy DecksAbout

Built with Love. 0xHabib © 2025

Anonymous analytics are collected for performance monitoring and site improvement purposes.

Formal automata diagrams overlaid on cybersecurity infrastructure

The Chomsky Hierarchy and Security: Why Parsers Matter

A deep dive into formal language theory, automata, and Turing machines and their profound implications for cybersecurity. Learn why regex WAFs fail, how injection attacks exploit parser differentials, and how to apply grammar-based parsing to stealer logs and malware analysis.

December 28, 2025
21 min read
byMohamed Habib Jaouadi
#LangSec
#Computer Science
#Blue Team
#Exploit Development
#Parsing
#Theory
#WAF Bypass
#Compilers

Who this is for: Security engineers, penetration testers, and software architects who want to understand why certain vulnerability classes exist rather than just memorizing exploitation techniques. Familiarity with basic programming concepts is assumed.

TL;DR: SQL Injection, XSS, and similar vulnerabilities are not random bugs, they are predictable consequences of using the wrong computational model to parse input. Regular expressions cannot recognize nested structures like HTML or SQL. Using regex to filter context-free data is mathematically impossible. This post explains the theory behind this limit and shows how to apply it to real security problems.

Introduction

Security practitioners categorize vulnerabilities by their effects: Cross-Site Scripting, SQL Injection, Command Injection, HTTP Request Smuggling. These categories describe symptoms, but they share a common underlying cause rooted in computation theory. At their core, these vulnerabilities represent failures in how systems parse and interpret input data.

When we build software, we implicitly define a language that the software accepts. An SQL query follows a grammar. An HTTP request adheres to a specification. A JSON document has rules about structure and nesting. Problems arise when the language that developers think their software accepts differs from the language their software actually accepts.

This gap between intended and actual behavior is the domain of Language Theoretic Security (LangSec), a discipline that applies formal language theory to security. LangSec's central insight is captured in a single phrase:

"The road to hell is paved with ad-hoc parsers."

When developers treat complex structured data as simple strings and attempt to sanitize it with pattern matching, they are attempting the mathematically impossible. To understand why, we need to examine the foundations of computational theory.

Background: Formal Language Theory

Formal language theory provides the mathematical foundation for understanding what computers can and cannot process. Developed primarily by Noam Chomsky in the 1950s, this framework classifies languages into a hierarchy based on their computational complexity.

What is a Formal Language?

A formal language is simply a set of strings over some alphabet. The alphabet might be ASCII characters, binary digits, or any other finite set of symbols. The language defines which combinations of those symbols are "valid" according to some rules.

For example, the language of valid email addresses includes strings like user@example.com but excludes @@invalid. The language of valid JSON documents includes {"key": "value"} but excludes {key: value} (missing quotes). Every protocol, data format, and programming language implicitly defines a formal language.

Formal Grammars: Production Rules

Languages are specified using grammars; sets of rules that generate all valid strings. A grammar consists of:

  • Terminal symbols: The actual characters that appear in strings (like a, b, 0, 1)
  • Non-terminal symbols: Placeholders used during derivation (like S, A, B)
  • Production rules: Rules that replace non-terminals with other symbols
  • Start symbol: The initial non-terminal from which derivation begins

Consider a simple grammar for generating balanced parentheses:

S → (S)
S → SS
S → ε

Here, S is the start symbol, ( and ) are terminals, and ε represents the empty string. This grammar generates strings like (), (()), ()(), and ((())).

To derive (()):

S → (S) → ((S)) → (())

The Chomsky Hierarchy

The Chomsky Hierarchy classifies grammars into four levels based on the restrictions placed on production rules. Each level corresponds to a specific type of computational machine required for recognition.

The Chomsky Hierarchy: Language Classes
Nested sets showing containment. Every regular language is context-free, every CFL is context-sensitive, etc.
RegularContext-FreeContext-SensitiveRecursively Enumerablea*b*, IPsJSON, SQLaⁿbⁿcⁿPrograms
Click on a region to learn about that language class.
Key insight: Each inner class is a proper subset of the outer. A regex (Regular) cannot match balanced brackets (Context-Free). A PDA cannot verify aⁿbⁿcⁿ (Context-Sensitive).

The diagram above shows the key property of the hierarchy: proper containment. Every regular language is context-free. Every context-free language is context-sensitive. But the reverse is not true as there are context-free languages that are not regular, and context-sensitive languages that are not context-free.

Type 3: Regular Languages

Regular languages are the simplest class, recognized by Finite Automata (FA). Production rules in a regular grammar must have the form:

A → aB   (non-terminal produces terminal followed by non-terminal)
A → a    (non-terminal produces terminal)
A → ε    (non-terminal produces empty string)

A grammar for strings ending in "ab":

S → aS | bS | aA
A → b

This generates: ab, aab, bab, aaab, abab, etc.

The key property of regular languages is that they require no memory to recognize. A finite automaton reads input one symbol at a time, transitions between a finite number of states, and accepts or rejects based solely on the final state reached.

Finite Automaton: Accepts strings ending in "ab"
A DFA with 3 states. Enter a string of a's and b's to simulate.
baababq₀q₁q₂
Input:
→ Start state◎ Accept state (double circle)

The automaton above accepts strings ending in "ab". Try inputs like ab, bab, aab, or ba to see how the machine transitions. Notice that it has no way to "remember" what it saw earlier, so only the current state matters.

Security Implication: Regular languages are tractable to process safely, provided deterministic or linear-time automata are used. Recognition takes O(n) time with no recursion or stack exhaustion. However, note that many "regex" engines (PCRE, Python's re) include backtracking features that can cause catastrophic performance (ReDoS) on pathological inputs. True safety requires DFA-based matching or careful pattern design.

Practical Uses: Validating simple tokens; IP addresses, UUIDs, basic email patterns, date formats.

A note on "regex" vs. regular languages: Modern regex engines often include backtracking, recursion, lookahead, and backreferences that go beyond regular languages. These features effectively embed a parser and reintroduce complexity, performance risks, and ambiguity. The LangSec argument still holds: if you need these features, you needed a parser, not a filter.

Type 2: Context-Free Languages

Context-free languages require a more powerful machine: the Pushdown Automaton (PDA), which adds a stack memory to the finite automaton. Production rules have the form:

A → α   (where α is any string of terminals and non-terminals)

A grammar for balanced brackets:

S → [S]
S → SS
S → ε

A grammar for the language aⁿbⁿ (equal a's followed by b's):

S → aSb
S → ε

This generates: ε, ab, aabb, aaabbb, etc. The key property is that this language cannot be recognized by any finite automaton. No matter how many states you add, a finite automaton cannot count arbitrarily large numbers of as to match against bs.

Solving the Balanced String Problem with a Stack

The language L = { aⁿbⁿ | n ≥ 0 } requires memory. Specifically, it requires a stack; a last-in-first-out data structure. Here is how a pushdown automaton solves this problem:

Algorithm:

  1. Start with an empty stack (containing only the bottom marker Z)
  2. For each a in the input:
    • Push symbol A onto the stack
  3. When we see the first b:
    • Switch to "matching mode"
  4. For each b in the input:
    • Pop one A from the stack
    • If stack is empty (only Z remains) and we see another b, reject (too many b's)
  5. At end of input:
    • If stack contains only Z, accept (balanced)
    • If stack contains unmatched As, reject (too many a's)

The stack serves as a counter. Each a adds to the count; each b decrements it. The string is balanced if and only if the count returns to zero at the end.

Pushdown Automaton: Accepts aⁿbⁿ
A PDA that accepts equal numbers of a's followed by b's. Uses a stack to count.
States
b,A/εε,Z/Za,ε/Ab,A/εq0q1q2
Stack
Empty
Input:
Language: L = { aⁿbⁿ | n ≥ 0 } — Context-Free, not Regular

The pushdown automaton above implements this algorithm. Try aabb (accepted), aaabbb (accepted), aab (rejected: too many a's), abb (rejected: too many b's). Watch how the stack grows when reading as and shrinks when reading bs.

Security Implication: Most structured data formats; JSON, XML, HTML, SQL, and programming languages are context-free. They require proper parsers with stack-based memory. Regular expressions cannot recognize them correctly.

Type 1: Context-Sensitive Languages

Context-sensitive languages are recognized by Linear Bounded Automata (LBA); Turing machines whose tape length is limited to a linear function of the input length. Production rules have the form:

αAβ → αγβ   (A can only be replaced when surrounded by α and β)

The classic example is L = { aⁿbⁿcⁿ | n ≥ 0 }. This language requires matching three groups, which cannot be done with a single stack. You need more sophisticated memory structures.

Real-world examples in security:

  • Protocol length fields: HTTP's Content-Length header must match the actual body size. Verifying this correspondence requires memory proportional to the input.
  • Cross-field validation: A certificate's issuer field in one structure must equal the subject field of another. These cross-references create context-sensitive constraints.
  • Type systems: Many programming languages have type systems that enforce context-sensitive properties (variable names must be declared before use, types must match across expressions).

Security Implication: Context-sensitive constraints are decidable (unlike Type 0) but computationally expensive, membership is PSPACE-complete in the general case. Attackers can sometimes exploit the gap between what a fast-path validator checks and what the full context-sensitive grammar requires.

Type 0: Recursively Enumerable Languages and Turing Machines

The most powerful computational model is the Turing Machine. Understanding Turing machines is essential for understanding fundamental limits on what any algorithm,including security tools, can accomplish.

What is a Turing Machine?

A Turing Machine consists of:

  • An infinite tape divided into cells, each holding a symbol
  • A head that reads and writes symbols, moving left or right
  • A finite set of states
  • A transition function that determines the next action based on current state and symbol read
Tape:  ... □ □ [a] a b b □ □ ...
                ↑
              Head

State: q0

Transition: (q0, a) → (q1, X, R)
            "In state q0, reading 'a': write 'X', move Right, go to state q1"

The Turing Machine can simulate any algorithm. If a problem can be solved by any computational process, it can be solved by a Turing Machine. This is the Church-Turing Thesis.

The Halting Problem

Here is the fundamental limit: no algorithm can determine whether an arbitrary Turing Machine will halt on a given input.

This is proven by contradiction. Suppose a "halt detector" H exists:

H(program, input) → "halts" or "loops forever"

Construct a program D that does the opposite of what H predicts:

def D(program):
    if H(program, program) == "halts":
        loop_forever()
    else:
        halt()

Now ask: does D(D) halt?

  • If H says D(D) halts, then D loops forever (contradiction)
  • If H says D(D) loops, then D halts (contradiction)

Therefore, H cannot exist.

Security Implications of Undecidability

The halting problem has profound implications for security:

You cannot build a perfect malware detector. No static analyzer can determine whether an arbitrary program is malicious without running it. Malware authors can always construct code that defeats any detection algorithm.

You cannot validate Turing-complete inputs. If your application accepts input that can express arbitrary computation (Python scripts, serialized objects, macros), no validator can guarantee safety. The only options are:

  1. Sandbox the execution
  2. Restrict the input language to a decidable subset
  3. Accept the risk

Antivirus is fundamentally limited. This is not a failure of implementation, it is a mathematical impossibility. Defenders must rely on heuristics, sandboxing, and behavior monitoring because perfect static detection is provably impossible.

Closure Properties

Formal languages have closure properties that determine how operations affect language class:

OperationRegularContext-Free
UnionClosedClosed
ConcatenationClosedClosed
Kleene StarClosedClosed
IntersectionClosedNot closed
ComplementClosedNot closed

The non-closure of CFLs under intersection has security implications: even if two filters each accept only "safe" context-free inputs, their combination might not.

Building Programming Languages: The Compiler Pipeline

Understanding language processing requires examining how compilers and interpreters work. The translation from source code to executable form proceeds through distinct phases, each operating on a different representation of the program.

Phase 1: Lexical Analysis (Tokenization)

The lexer (or scanner) reads raw source characters and groups them into tokens; meaningful units like keywords, identifiers, operators, and literals.

Consider this SQL fragment:

SELECT name FROM users WHERE age > 18

The lexer produces a token stream:

KEYWORD:SELECT  IDENTIFIER:name  KEYWORD:FROM  IDENTIFIER:users
KEYWORD:WHERE  IDENTIFIER:age  OPERATOR:>  INTEGER:18

Lexers are typically implemented as finite automata because individual tokens are regular languages. The following pseudocode demonstrates the concept:

class Token:
    def __init__(self, type: str, value: str):
        self.type = type
        self.value = value

The tokenizer scans through the input string character by character:

def tokenize(input_string: str) -> list[Token]:
    tokens = []
    i = 0
    
    while i < len(input_string):
        char = input_string[i]

Whitespace is skipped without producing tokens:

        if char.isspace():
            i += 1
            continue

Alphabetic characters begin either keywords or identifiers. The lexer reads the entire word and then checks against a keyword list:

        if char.isalpha():
            word = ""
            while i < len(input_string) and input_string[i].isalnum():
                word += input_string[i]
                i += 1
            
            if word.upper() in ("SELECT", "FROM", "WHERE", "AND", "OR"):
                tokens.append(Token("KEYWORD", word.upper()))
            else:
                tokens.append(Token("IDENTIFIER", word))
            continue

Digits begin numeric literals:

        if char.isdigit():
            num = ""
            while i < len(input_string) and input_string[i].isdigit():
                num += input_string[i]
                i += 1
            tokens.append(Token("INTEGER", num))
            continue

Single-character operators are recognized directly:

        if char in "><=!":
            tokens.append(Token("OPERATOR", char))
            i += 1
            continue
        
        i += 1
    
    return tokens

Security Relevance: The boundary between tokens is where injection attacks occur. When user input is concatenated into a query string before tokenization, the attacker can inject characters that the lexer interprets as token boundaries, transforming data into code.

Phase 2: Parsing (Syntactic Analysis)

The parser reads the token stream and builds an Abstract Syntax Tree (AST) representing the program's grammatical structure. Parsers implement context-free grammars.

A simplified SQL grammar:

Query      → SELECT Columns FROM Identifier WhereClause?
Columns    → Identifier | '*'
WhereClause → WHERE Condition
Condition  → Identifier Operator Value
Value      → Identifier | Integer | String

The parser verifies that tokens appear in valid sequences according to this grammar. If the input violates the grammar, parsing fails with a syntax error.

Security Relevance: SQL injection succeeds when attacker input changes the shape of the AST. Parameterized queries prevent this by separating the query structure (parsed once) from the data values (substituted later).

Phase 3: Semantic Analysis

Semantic analysis verifies meaning beyond syntax: type checking, scope resolution, and constraint validation. A syntactically valid program might be semantically invalid:

SELECT age FROM users WHERE name > 18

This is syntactically correct but semantically problematic, comparing a string column to an integer.

Security Relevance: Many vulnerabilities arise from semantic gaps. A WAF might parse input syntactically but miss semantic violations that the backend interprets differently.

Phase 4: Code Generation and Execution

Finally, the compiler generates executable code or the interpreter directly executes the AST. At this stage, the abstract representation becomes concrete behavior.

Applications of Parsers in Information Security

The parsing pipeline is not merely academic, it has direct, practical applications in security operations. Teams that understand formal language theory can build more robust tools, detect more sophisticated attacks, and analyze complex malicious artifacts.

Parsing Stealer Logs

Information stealers (RedLine, Vidar, Raccoon, Lumma) exfiltrate credentials, cookies, and system data from compromised machines. This data follows predictable formats, making it amenable to grammar-based parsing.

A typical stealer log might contain:

=== Chrome Passwords ===
URL: https://bank.example.com/login
Username: victim@email.com
Password: p@ssw0rd123

=== System Info ===
OS: Windows 10 Pro
IP: 192.168.1.100
HWID: ABC123...

Each stealer family has its own format, but they share common structures:

StealerLog     → Section*
Section        → SectionHeader Entry*
SectionHeader  → '===' SectionName '===' NEWLINE
Entry          → Key ':' Value NEWLINE
Key            → IDENTIFIER
Value          → TEXT

A grammar-based parser provides several advantages over ad-hoc regex parsing:

Robustness: The parser handles variations in whitespace, encoding, and formatting that would break fragile regex patterns.

Extensibility: Adding support for a new stealer family requires defining its grammar, not rewriting extraction logic.

Structured output: The parser produces a typed AST that can be directly processed, enriched, and stored.

class StealerLogParser:
    """
    CFG-based parser for stealer log formats.
    Produces structured output suitable for threat intelligence.
    """
    
    def __init__(self, log_content: str):
        self.content = log_content
        self.pos = 0
        self.sections = []

The parser first tokenizes section headers:

    def parse_section_header(self) -> str | None:
        """Match section headers like '=== Chrome Passwords ==='"""
        match = re.match(r'===\s*(.+?)\s*===', self.content[self.pos:])
        if match:
            self.pos += match.end()
            self.skip_whitespace()
            return match.group(1)
        return None

Then extracts key-value entries:

    def parse_entry(self) -> tuple[str, str] | None:
        """Match entries like 'URL: https://...'"""
        match = re.match(r'(\w+):\s*(.+?)(?=\n|$)', self.content[self.pos:])
        if match:
            self.pos += match.end()
            self.skip_whitespace()
            return (match.group(1), match.group(2))
        return None

The main parsing loop applies production rules:

    def parse(self) -> dict:
        """Parse the entire log into structured sections."""
        result = {}
        
        while self.pos < len(self.content):
            header = self.parse_section_header()
            if header:
                entries = {}
                while True:
                    entry = self.parse_entry()
                    if entry is None:
                        break
                    entries[entry[0]] = entry[1]
                result[header] = entries
            else:
                self.pos += 1  # Skip unrecognized content
        
        return result

Note on regex usage: The re.match() calls above act as token recognizers (lexer-level helpers) within the grammar-driven parsing loop, not as validation logic for the entire format. This is analogous to how compilers use regex/DFA to identify tokens, then use a CFG parser for structure. The key architectural point is that the grammar defines what constitutes a valid log, and the regexes only identify individual terminals.

This approach scales to multiple stealer families by defining family-specific grammars and composing them.

Malware Configuration Extraction

Many malware families embed configuration data in structured formats; encrypted JSON, custom binary formats, or registry-style key-value stores. Grammar-based parsing enables systematic extraction.

Consider a hypothetical C2 beacon configuration:

[config]
server=https://c2.evil.com:443
sleep=60
jitter=0.2
[commands]
whoami=c2_cmd_1
download=c2_cmd_2

The grammar:

Config        → Section*
Section       → '[' SectionName ']' NEWLINE Setting*
SectionName   → IDENTIFIER
Setting       → Key '=' Value NEWLINE

A parser built from this grammar extracts configurations reliably, regardless of whitespace variations or ordering.

Protocol Dissection

Network protocols follow grammars. HTTP, DNS, TLS, and custom C2 protocols all have formal or informal specifications that define valid message structures.

When analyzing unknown C2 protocols, security researchers:

  1. Capture network traffic
  2. Identify invariant structures (magic bytes, length fields)
  3. Hypothesize a grammar that generates observed messages
  4. Test the grammar against additional samples

This is essentially formal language inference applied to malware analysis.

Building Security Tools with Parser Generators

Parser generators like ANTLR, PLY (Python Lex-Yacc), or tree-sitter produce parsers from grammar specifications. Security teams can use these to build:

  • Custom log analyzers that handle domain-specific formats
  • Configuration validators that enforce schema compliance
  • Protocol fuzzers that generate grammar-conforming but adversarial inputs
  • Source code analyzers that detect vulnerable patterns

The grammar becomes documentation; the generated parser becomes the enforcement mechanism.

Why Regex Fails: A Formal Proof Sketch

The Pumping Lemma for Regular Languages proves that regular languages cannot express unbounded counting. For any regular language L, there exists a pumping length p such that any string s in L with |s| ≥ p can be split into xyz where:

  1. |y| > 0
  2. |xy| ≤ p
  3. For all n ≥ 0, xy^n z is in L

The language of balanced parentheses { (^n )^n | n ≥ 0 } violates this lemma. Any attempt to pump (repeat) a middle section produces unbalanced strings. Therefore, no regex can recognize balanced parentheses.

Practical consequence: No regex-based WAF can correctly filter HTML, SQL, or any other context-free language. Attackers can always find inputs that the regex accepts but the backend interprets maliciously.

Practical Examples: Theory in Action

SQL Injection: Token-Level Attack

Consider the tokenization of a malicious input:

username = "admin' OR '1'='1"
query = f"SELECT * FROM users WHERE name = '{username}'"

The constructed query string:

SELECT * FROM users WHERE name = 'admin' OR '1'='1'

Token stream produced by the SQL lexer:

KEYWORD:SELECT  OPERATOR:*  KEYWORD:FROM  IDENTIFIER:users
KEYWORD:WHERE  IDENTIFIER:name  OPERATOR:=  STRING:'admin'
KEYWORD:OR  STRING:'1'  OPERATOR:=  STRING:'1'

The attacker's input (admin' OR '1'='1) was intended to be a single STRING token. Instead, the closing quote terminated the string early, and subsequent characters became KEYWORD, STRING, and OPERATOR tokens. The input escaped from the data plane into the control plane.

Parameterized queries solve this at the protocol level:

query = "SELECT * FROM users WHERE name = ?"
cursor.execute(query, [username])

The query structure is parsed once; the parameter is substituted after parsing. The parameter cannot affect token boundaries because the tokenization already happened.

XSS: Context-Switching Attack

HTML parsing is more complex because it involves multiple nested languages (HTML → CSS → JavaScript). An XSS vulnerability exploits transitions between these contexts.

<div class="user-bio">{{user_input}}</div>

If user_input contains </div><script>alert(1)</script><div>, the browser's HTML parser sees:

<div class="user-bio"></div><script>alert(1)</script><div></div>

The </div> closes the bio div, <script> begins a script element, and the JavaScript engine executes alert(1).

A regex filter for <script> fails against:

<scr<script>ipt>alert(1)</script>

The regex removes <script>, leaving:

<script>alert(1)</script>

The HTML parser, being context-free, correctly recognizes the tag. The regex, being regular, cannot.

Defense Strategies Informed by Language Theory

1. Match Parser Power to Input Complexity

Input TypeLanguage ClassAppropriate Tool
UUIDRegularRegex
IP AddressRegularRegex
Email (basic)RegularRegex
JSONContext-Freejson.parse()
XML/HTMLContext-FreeDOM Parser
SQLContext-FreeParameterized Queries
Python/JS/etc.Turing-CompleteSandboxed Execution

2. Validate Early, Parse Once

Parse input at the system boundary and pass the structured representation (AST, DOM, typed object) internally. Never re-parse string representations downstream.

# Bad: String passed through, re-parsed by each layer
def process_request(json_string: str):
    layer1_process(json_string)
    layer2_process(json_string)

# Good: Parse once, pass structure
def process_request(json_string: str):
    data = json.loads(json_string)  # Parse at boundary
    layer1_process(data)            # Receive typed dict
    layer2_process(data)            # Same typed dict

3. Minimize Input Language Power

If you're designing a configuration format, choose the simplest language class that meets your needs:

  • Need simple key-value pairs? Use a regular language.
  • Need nested structures? Use a context-free language (JSON, TOML).
  • Need computation? Reconsider your requirements.

Avoid formats that are accidentally Turing-complete (YAML with arbitrary object construction, XML with external entities).

4. Test Parser Boundary Cases

Systematic testing should include:

  • Inputs at the boundary between valid and invalid
  • Deeply nested structures (stack exhaustion)
  • Unusual but technically valid encodings
  • Inputs that different parsers might interpret differently

Conclusion

Formal language theory is not an academic curiosity, it is the mathematical foundation explaining why entire classes of vulnerabilities exist and why certain defenses work while others fail.

Key takeaways:

Regular expressions are regular. They recognize Type 3 languages and nothing more. Using regex to filter HTML, SQL, or JSON is mathematically impossible to do correctly.

Context-free languages require stacks. Parsers for JSON, HTML, and SQL must track nesting depth. This is why these formats can express structures that regular expressions cannot.

Turing machines are undecidable. No algorithm can perfectly determine whether arbitrary code is malicious. Sandboxing and behavior monitoring are necessary complements to static analysis.

Parameterized queries enforce grammar. They work because they separate the query structure from data values at the parsing layer, making injection impossible regardless of input content.

Parser differentials create vulnerabilities. When two systems interpret the same input differently, attackers can smuggle malicious content past security controls.

Parsers have broad security applications. From stealer log analysis to protocol dissection, grammar-based parsing provides robust, extensible solutions for structured data processing.

The goal of LangSec is precise: define the language your system should accept, implement a parser that recognizes exactly that language, and reject everything else. Within those boundaries, behavior is predictable. Security becomes engineering rather than guesswork.

References

  • LangSec.org - Language-theoretic Security Project
  • Chomsky, N. (1956). "Three models for the description of language." IRE Transactions on Information Theory.
  • Hopcroft, J. E., Motwani, R., & Ullman, J. D. (2006). Introduction to Automata Theory, Languages, and Computation.
  • Sipser, M. (2012). Introduction to the Theory of Computation. Cengage Learning.
  • OWASP SQL Injection Prevention Cheat Sheet
  • OWASP XSS Prevention Cheat Sheet
  • Sassaman, L., Patterson, M. L., Bratus, S., & Shubina, A. (2011). "The Halting Problems of Network Stack Insecurity." LangSec Workshop. Available at langsec.org
  • Rice, H. G. (1953). "Classes of Recursively Enumerable Sets and Their Decision Problems." Trans. Amer. Math. Soc.

Read Also

Windows Development with C++ - Win32 API Fundamentals
17 min read
December 18, 202517 min read

Windows Development with C++: Part 1 - Foundations

by Mohamed Habib Jaouadi

Part 1 of the Windows Development series. Master Win32 API fundamentals, window creation, the message loop, and modern C++ patterns for native Windows programming.

#windows-development-series
#win32-api
#c++
+3
Technical visualization of Command and Control infrastructure
14 min read
December 14, 202514 min read

Command & Control in 2025: Architecture, Evasion & Operations

by Mohamed Habib Jaouadi

A technical deep dive into modern C2 architecture (Sliver, Havoc), evasion techniques (Shellter Elite, Stack Spoofing, AMSI Blinding), and alternative infrastructure (Discord C2, Cloud Redirectors).

#C2
#Malware Development
#Red Teaming
+3
Windows Protected Processes - Security Analysis and Inspection Tools
17 min read
November 22, 202517 min read

Windows Protected Processes Series: Part 1

by Mohamed Habib Jaouadi

Part 1 of the Windows Protected Processes series. Learn about protected processes, Process Explorer limitations, and why even administrators can't access critical system processes like CSRSS and LSASS.

#windows-protected-processes-series
#windows-internals
#process-inspection
+3
Windows Protected Processes Part 2 - Advanced Inspection and Security
33 min read
November 22, 202533 min read

Windows Protected Processes Series: Part 2

by Mohamed Habib Jaouadi

Advanced inspection techniques with Process Hacker, WinDbg kernel debugging, LSASS credential protection, BYOVD attacks, detection strategies, and system hardening for Windows protected processes.

#windows-protected-processes-series
#process-hacker
#windbg
+5
DNS Fundamentals and Security Analysis - DNS Security Series Part 1
20 min read
August 25, 202520 min read

DNS Security Analysis Series: Part 1 - DNS Fundamentals and Architecture

by Mohamed Habib Jaouadi

Deep dive into DNS architecture, record types, resolution process, and security analysis techniques for network defenders and DNS analysts.

#dns-security-series
#dns-analysis
#dns-forensics
+3
Network Architecture and Blue Team Defense Strategies
15 min read
August 7, 202515 min read

Enterprise Network Architecture for Blue Team Operations: Visibility, Segmentation, and Modern Defense Strategies

by Mohamed Habib Jaouadi

A guide to enterprise network architecture for blue team operations.

#blue-team
#network-architecture
#network-security
+5
Malware Development Series Part 3 - Detection and Windows Processes
16 min read
July 27, 202516 min read

Malware Development Series: Part 3

by Mohamed Habib Jaouadi

Detection mechanisms, Windows processes, threads, memory types, and the Process Environment Block (PEB) for security professionals.

#malware-development-series
#malware-detection
#windows-processes
+4
Statistics and Probability for Engineering Benchmarks
17 min read
July 14, 202517 min read

The Statistics You Learned in School but Never Applied

by Mohamed Habib Jaouadi

Bridge the gap between academic statistics and real-world engineering.

#performance
#statistics
#benchmarking
+2
Malware Development Series Part 2 - Memory Management and PE Analysis
18 min read
July 13, 202518 min read

Malware Development Series: Part 2

by Mohamed Habib Jaouadi

Windows memory management, API fundamentals, PE file format, and DLL mechanics for security professionals.

#malware-development-series
#windows-memory
#pe-format
+3
Malware Development Series - Security Research and Analysis
10 min read
July 7, 202510 min read

Malware Development Series: Part 1

by Mohamed Habib Jaouadi

Part 1 of the malware development series. Learn the fundamentals of ethical malware development, Windows architecture, and essential tools for penetration testers and red teams.

#malware-development-series
#ethical-hacking
#red-team
+3