In this article
What Is XPath?
XPath (XML Path Language) is a W3C standard query language for selecting nodes from XML documents. It uses a path-like syntax to navigate through the hierarchical structure of an XML tree, selecting elements, attributes, and text nodes based on their position, name, or value.
XPath is not a standalone technology -- it is embedded inside other standards like XSLT, XQuery, and DOM APIs. Every modern browser supports XPath for querying HTML documents, and tools like Selenium, Scrapy, and lxml use XPath expressions extensively for web scraping and automated testing.
How XPath Queries Work
An XPath expression navigates the document tree using axes (directions), node tests (filters), and predicates (conditions). Understanding these three concepts lets you construct any query.
- Axes and paths -- / selects from the root, // selects descendants anywhere, .. moves to the parent, and named axes like following-sibling:: navigate relative to the current node
- Predicates and filters -- square brackets [] add conditions: //book[price>30] selects books with price above 30, while //div[@class='main'] selects divs with a specific class attribute
- Functions and operators -- XPath provides built-in functions like contains(), starts-with(), normalize-space(), and count() for string manipulation, comparison, and node counting
Try it free — no signup required
Test an XPath Query →When To Use XPath
XPath is essential whenever you need to extract specific data from XML or HTML documents programmatically.
- Web scraping -- extract product prices, article titles, or links from web pages using XPath expressions in tools like Scrapy, Puppeteer, or browser developer consoles
- XML configuration -- query and validate complex configuration files like Maven pom.xml, Android manifests, or Spring XML contexts to find specific settings or dependencies
- XSLT transformations -- select nodes for transformation in XSLT stylesheets, which rely entirely on XPath expressions to match and process XML elements into different output formats
Frequently Asked Questions
When should I use XPath instead of CSS selectors?
CSS selectors are simpler and faster for basic element selection by tag, class, or ID. Choose XPath when you need to select by text content (//a[contains(text(),'Login')]), navigate upward to parent nodes, use complex predicates, or work with XML (not HTML) documents. XPath is also required in XSLT and XQuery contexts.
What changed between XPath 1.0 and 2.0?
XPath 2.0 added a richer type system (dates, sequences, regular expressions), conditional expressions (if/then/else), quantified expressions (some/every), and range expressions. However, most browsers and scraping tools only support XPath 1.0. Use XPath 2.0 features in server-side tools like Saxon or when working with XSLT 2.0.
How do I handle XML namespaces in XPath?
Namespaced elements require a prefix mapping. In most APIs, you register a namespace prefix (e.g., ns='http://example.com') and then query using that prefix: //ns:book/ns:title. Without registering the namespace, //book will not match elements in a namespace even if the local name is 'book'. Some tools support local-name() as a workaround: //*[local-name()='book'].