Regex
Pulling values out of text with regular expressions: capturing between fixed markers, non-greedy multiline capture with re.DOTALL, and collecting every match with findall.
Capture between fixed markers
When data sits between two fixed markers, a single capture group beats a full DOM parse.
When the data sits between two fixed markers and a full DOM walk is unnecessary, one capture group is faster than parsing. ([^<]*) grabs everything up to the next <, so it stops cleanly at the closing tag. group(1) is the captured text; it must always be guarded for None (no match) before use. Add re.DOTALL when the value can span newlines (next snippet).
import re
m = re.search(r"Results:</b><br><br>([^<]*)</center>", r.text)
if m and m.group(1).strip():
value = m.group(1).strip()Sample response fragment
<center>Results:</b><br><br>HTB{f1a9b2...c2}</center>Example output
value -> "HTB{f1a9b2...c2}"Find by: regex, re.search, capture group, extract, between markers, parse response, scrape without bs4, group1, pattern, csrf, token, flag · Source: CWEE/XPath in-band
Non-greedy capture across newlines (re.DOTALL)
re.DOTALL lets . match newlines so a capture group can span lines between two labels; .*? stays non-greedy so it stops at the first delimiter.
.*? is non-greedy, so it stops at the FIRST following Vaccination Status: instead of running to the last one on the page. \s* absorbs the surrounding spaces and the newline between the two labels. re.DOTALL makes . match newline characters too, which is required here because the marker and the value straddle a line break – without it the match fails. group(0) is the whole match including the Name:/Vaccination labels; group(1) – the parenthesised (.*?) – is just the captured value. Always guard for a None match before calling .group().
import re
text = "Name: uid=0(root) gid=0(root) groups=0(root)\nVaccination Status: Complete"
match = re.search(r"Name:\s*(.*?)\s*Vaccination\s+Status:", text, re.DOTALL)
print(match.group(1).strip() if match else None)Sample text (the two labels sit on different lines)
Name: uid=0(root) gid=0(root) groups=0(root)
Vaccination Status: Completegroup(1)
uid=0(root) gid=0(root) groups=0(root)Find by: regex, re.search, DOTALL, non greedy, lazy, multiline, newline, dot matches newline, capture group, group0 group1, whitespace, extract from pdf text, command output · Source: CWEE/PDF RCE
Every match at once (re.findall)
findall returns every non-overlapping match as a list; one group gives strings, many groups give tuples.
re.search returns only the first match; re.findall returns every non-overlapping match as a list. With a single capture group each element is the captured string; with two or more groups each element is a tuple of the groups. Use it to pull every id, row, or token from a page in one pass – for example enumerating object ids before an IDOR sweep. A no-match returns an empty list, so unlike re.search it needs no None guard.
import re
# ONE capture group -> list of strings; MANY groups -> list of tuples
ids = re.findall(r"/user/(\d+)", html) # ['1', '2', '3']
kv = re.findall(r"(\w+)=([^;]+)", cookie) # [('sid', 'abc'), ('role', 'admin')]Example output
ids -> ['1', '2', '3']
kv -> [('sid', 'abc'), ('role', 'admin')]Find by: regex, re.findall, all matches, list, multiple, tuples, groups, iterate matches, extract all ids, enumerate, scrape list, idor sweep, no none guard · Source: CWEE/many