BeautifulSoup
Focused BeautifulSoup recipes: reading attributes, element text, nested finds, find_all, CSS selectors, and stripped_strings.
BeautifulSoup — read a tag attribute (CSRF token, id, href)
Finds a tag by one attribute, then indexes it like a dict to read another attribute. The core technique for grabbing a CSRF token or object id before replaying a request.
find(tag, {attr: val}) returns the first element matching that attribute; indexing the result like a dict (["value"], ["href"]) reads any attribute off it. This is how a CSRF token, a hidden object id, or a download link is pulled out of a page before replaying it in the next request. Chaining .strip() / .split() cleans the value.
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, "html.parser")
csrf = soup.find("input", {"id": "csrf"})["value"] # by id
name = soup.find("input", {"name": "csrf"})["value"] # by name attribute
href = soup.find("a", {"target": "_blank"})["href"] # read the href
fid = soup.find("button", {"class": "delete-btn"})["value"].strip()Markup these calls target
<input id="csrf" name="csrf" value="9f8a1c">
<a target="_blank" href="/files/report.pdf">Open</a>
<button class="delete-btn" value="42">Delete</button>What each variable holds
csrf -> "9f8a1c"
name -> "9f8a1c"
href -> "/files/report.pdf"
fid -> "42"Find by: beautifulsoup, bs4, attribute, value, csrf token, hidden input, href, find by id, find by name, find by class, scrape token, grab id · Source: PG/Monster, WSA, PG/Zipper, PG/WallpaperHub
BeautifulSoup — read element text (command output, reflected value)
Pulls the visible text out of an element with get_text().strip(); split/cast slices a value out of a label like “Balance: $250”.
.get_text() returns the visible text of the first matching element; .strip() trims surrounding whitespace. This is how a command’s output, a reflected input, or a status message is read back out of the response. When the value is wrapped in a label (Balance: $250), .split(": ")[1].strip("$") drops the label and int(...) makes it usable in arithmetic — exactly what a balance/race loop needs.
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, "html.parser")
output = soup.find("span").get_text() # raw text of first <span>
result = soup.find("div", {"class": "divmin"}).get_text().strip() # by class, trimmed
heading = soup.find("h4").get_text().strip()
# value embedded in a label -> split off the label, cast to a number
balance = int(soup.find("strong").get_text().split(": ")[1].strip("$"))Markup these calls target
<span>uid=33(www-data) gid=33(www-data)</span>
<div class="divmin"> root:x:0:0:root:/root:/bin/bash </div>
<h4>config.php</h4>
<strong>Balance: $250</strong>What each variable holds
output -> "uid=33(www-data) gid=33(www-data)"
result -> "root:x:0:0:root:/root:/bin/bash"
heading -> "config.php"
balance -> 250 (an int, ready for arithmetic)Find by: beautifulsoup, bs4, get_text, element text, inner text, command output, reflected value, read response, strip, parse number, split value · Source: PG/XposedAPI, CWEE/Prototype Pollution, CWEE/Second Order, CWEE/Gift Card
BeautifulSoup — drill into a nested element (chained find)
Chains find() to narrow to a container, then searches inside it. The class_= keyword form applies when the target is only unique within a parent.
find() can be chained: the first call returns a container element, and calling .find() on that result searches only inside it. The class_= keyword is a Python-friendly alias for {"class": ...} (because class is a reserved word). This applies when the target element (<p>) is not unique on the page but is unique inside a known parent (div.card-content).
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, "html.parser")
# first find() narrows to the container, the second searches inside it
content = soup.find("div", class_="card-content").find("p").get_text().strip()Markup this targets
<div class="card-content">
<h5>report.txt</h5>
<p>SECRET{nested_value}</p>
</div>Result
content -> "SECRET{nested_value}"Find by: beautifulsoup, bs4, nested find, chained find, class_, find within, parent child, drill into, container, scoped search · Source: CWEE/Second Order LFI
BeautifulSoup — find_all then pick by index
find_all returns every match as a list; indexing the needed occurrence ([1], [-1]) applies when only the Nth copy of a repeated tag holds the data.
find_all(tag) returns a list of every matching element (unlike find, which returns only the first). Indexing it ([1], [-1]) applies when a page repeats a tag and only one position carries the needed value — a common shape for an in-band XPath or SQLi dump where the injected row lands at a fixed offset. .get_text(strip=True) is then called on the chosen element.
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, "html.parser")
second = soup.find_all("center")[1].get_text() # 2nd <center>
last = soup.find_all("td")[-1].get_text(strip=True) # last <td>Markup this targets
<center>Header</center>
<center>[email protected]</center>
<table><tr><td>id</td><td>0042</td></tr></table>What each variable holds
second -> "[email protected]"
last -> "0042"Find by: beautifulsoup, bs4, find_all, index, nth match, second element, last element, list of elements, td, center, in band dump · Source: CWEE/XPath in-band
BeautifulSoup — CSS selectors with select() / select_one()
select() takes any CSS selector and returns a list; select_one() returns the first. Applies when an attribute filter is not expressive enough (combinators, descendants, ids).
select() accepts a full CSS selector (a.btn, section.list > div, #main p) and returns a list of matches; select_one() returns just the first. The > combinator means direct child. This is the cleanest way to scrape a grid or table when the rows are identified by their position in the DOM rather than a single attribute.
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, "html.parser")
tiles = soup.select("section.container-list-tiles > div") # direct children -> list
first = soup.select_one("section.container-list-tiles > div")
names = [d.get_text(strip=True) for d in tiles]Markup this targets
<section class="container-list-tiles">
<div>Laptop</div>
<div>Mouse</div>
</section>What each variable holds
tiles -> [<div>Laptop</div>, <div>Mouse</div>] (2 elements)
first -> <div>Laptop</div>
names -> ["Laptop", "Mouse"]Find by: beautifulsoup, bs4, css selector, select, select_one, direct child, combinator, class selector, query, list of nodes, scrape grid · Source: WSA SQLi in-band
BeautifulSoup — collect a whole column (find_all + comprehension)
find_all on the repeated tag with a get_text(strip=True) comprehension scrapes an entire column at once — the backbone of a boolean/diff oracle.
Scraping a whole column — every product name, every row label — uses find_all on the repeated tag and builds a list with a comprehension over get_text(strip=True). strip=True trims whitespace per element so comparisons are exact. Diffing this list between a baseline request and an injected one is the core of a boolean / content-based SQLi oracle.
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, "html.parser")
# one-liner: every <h3>'s text as a clean list
titles = [t.get_text(strip=True) for t in soup.find_all("h3")]
# expanded equivalent
titles = []
for t in soup.find_all("h3"):
titles.append(t.get_text(strip=True))Markup this targets
<h3>Laptop</h3>
<h3>Mouse</h3>
<h3>Keyboard</h3>Result
titles -> ["Laptop", "Mouse", "Keyboard"]Find by: beautifulsoup, bs4, find_all, list comprehension, collect column, scrape all, get_text strip, all matches, loop, build list, oracle list, diff results · Source: WSA SQLi in-band
BeautifulSoup — separate text fragments with stripped_strings
When one element holds several separate text fragments, stripped_strings yields each one trimmed — unlike get_text() which concatenates them. Filtering isolates the needed piece.
When a single element wraps several distinct text fragments (a name, a price, a label), .get_text() glues them into one string. .stripped_strings instead yields each fragment as its own whitespace-trimmed string, so the generator can be filtered (if t.startswith("$")) to pick out exactly the needed piece.
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, "html.parser")
tile = soup.find("div", {"class": "tile"})
parts = [t for t in tile.stripped_strings] # each fragment, trimmed
price = [t for t in tile.stripped_strings if t.startswith("$")][0] # isolate oneMarkup this targets
<div class="tile">
<h3>Laptop</h3>
<span>$1,299</span>
<small>in stock</small>
</div>What each variable holds
parts -> ["Laptop", "$1,299", "in stock"]
price -> "$1,299"Find by: beautifulsoup, bs4, stripped_strings, text nodes, fragments, multiple texts, generator, filter text, price, name and price, split element text · Source: WSA SQLi in-band