Regex

Pulling values out of text with regular expressions: capturing between fixed markers, non-greedy multiline capture with re.DOTALL, and collecting every match with findall.

Capture between fixed markers

When data sits between two fixed markers, a single capture group beats a full DOM parse.

When the data sits between two fixed markers and a full DOM walk is unnecessary, one capture group is faster than parsing. ([^<]*) grabs everything up to the next <, so it stops cleanly at the closing tag. group(1) is the captured text; it must always be guarded for None (no match) before use. Add re.DOTALL when the value can span newlines (next snippet).

import re
m = re.search(r"Results:</b><br><br>([^<]*)</center>", r.text)
if m and m.group(1).strip():
    value = m.group(1).strip()

Sample response fragment

<center>Results:</b><br><br>HTB{f1a9b2...c2}</center>

Example output

value -> "HTB{f1a9b2...c2}"

_{Find by: regex, re.search, capture group, extract, between markers, parse response, scrape without bs4, group1, pattern, csrf, token, flag · Source: CWEE/XPath in-band}

Non-greedy capture across newlines (re.DOTALL)

re.DOTALL lets . match newlines so a capture group can span lines between two labels; .*? stays non-greedy so it stops at the first delimiter.

.*? is non-greedy, so it stops at the FIRST following Vaccination Status: instead of running to the last one on the page. \s* absorbs the surrounding spaces and the newline between the two labels. re.DOTALL makes . match newline characters too, which is required here because the marker and the value straddle a line break – without it the match fails. group(0) is the whole match including the Name:/Vaccination labels; group(1) – the parenthesised (.*?) – is just the captured value. Always guard for a None match before calling .group().

import re
text = "Name: uid=0(root) gid=0(root) groups=0(root)\nVaccination Status: Complete"
match = re.search(r"Name:\s*(.*?)\s*Vaccination\s+Status:", text, re.DOTALL)
print(match.group(1).strip() if match else None)

Sample text (the two labels sit on different lines)

Name: uid=0(root) gid=0(root) groups=0(root)
Vaccination Status: Complete

group(1)

uid=0(root) gid=0(root) groups=0(root)

_{Find by: regex, re.search, DOTALL, non greedy, lazy, multiline, newline, dot matches newline, capture group, group0 group1, whitespace, extract from pdf text, command output · Source: CWEE/PDF RCE}

Every match at once (re.findall)

findall returns every non-overlapping match as a list; one group gives strings, many groups give tuples.

re.search returns only the first match; re.findall returns every non-overlapping match as a list. With a single capture group each element is the captured string; with two or more groups each element is a tuple of the groups. Use it to pull every id, row, or token from a page in one pass – for example enumerating object ids before an IDOR sweep. A no-match returns an empty list, so unlike re.search it needs no None guard.

import re
# ONE capture group -> list of strings; MANY groups -> list of tuples
ids = re.findall(r"/user/(\d+)", html)          # ['1', '2', '3']
kv  = re.findall(r"(\w+)=([^;]+)", cookie)       # [('sid', 'abc'), ('role', 'admin')]

Example output

ids -> ['1', '2', '3']
kv  -> [('sid', 'abc'), ('role', 'admin')]

_{Find by: regex, re.findall, all matches, list, multiple, tuples, groups, iterate matches, extract all ids, enumerate, scrape list, idor sweep, no none guard · Source: CWEE/many}

Parsing Encodings