All Questions
5
questions
3
votes
1
answer
2k
views
Scrapy crawl extracted links
I need to crawl a website, and crawl every url from that site on a specific xpath
for example.:
I need to crawl "http://someurl.com/world/" which has 10 links in the container (xpath("//div[@class='...
2
votes
2
answers
3k
views
How to get python to load right library (a .dylib, not .so.3 on OSX)
I'm using the extractor module in python 2.7 via pip install extractor. I'm on OS X using homebrew, and I have previously run homebrew install libextractor. This creates files with extensions .a ...
1
vote
2
answers
4k
views
Wikipedia extractor problem ValueError: cannot find context for 'fork'
My aim is to get plain text (without links, tags, parameters and other trash, only articles text) from wikipedia xml dumps (https://dumps.wikimedia.org/backup-index.html). I found WikiExtractor python ...
0
votes
0
answers
14
views
Why can't my regex pick up the phone #s on the web page? [duplicate]
Hey guys so I am building a phone and email extractor using python regex and while it works for the emails, it won't work for the phone numbers.
The code for finding phone number matches on the ...
0
votes
1
answer
192
views
How can resolve recursion depth exceeded (Goose-extractor)
I am one problem with goose-extractor
This is my code:
for resultado in soup.find_all('a', href=True,text=re.compile(llave)):
url = resultado['href']
article = g.extract(url=url)
...