Quantcast
Viewing latest article 1
Browse Latest Browse All 2

Answer by Mathias Müller for xpath on xml rss feed does not work as expected

When you look at the HTML source in a browser, you see that the feed XML is in a default namespace:

<feed xmlns="http://www.w3.org/2005/Atom">

All descendant elements of feed also belong to this namespace - which is why your selectors do not yield anything. Except for the one selecting an attribute:

It seems only attributes such as @href are accessable

since attributes do not take on a default namespace - and remain in no namespace.


If you'd like to accesss elements that are in a namespace, you have to register the said namespace first, and choose a prefix for it:

xxs.register_namespace("atom", "http://www.w3.org/2005/Atom")

Then, prefix the elements with atom:(or any other prefix):

xxs.select("//atom:link").extract()

Find more information in the relevant section of the Scrapy documentation.


Viewing latest article 1
Browse Latest Browse All 2

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>