Highest scored 'extractor+python' questions

3 votes

1 answer

2k views

Scrapy crawl extracted links

I need to crawl a website, and crawl every url from that site on a specific xpath for example.: I need to crawl "http://someurl.com/world/" which has 10 links in the container (xpath("//div[@class='...

Nikola Niko

33

asked Oct 24, 2015 at 12:29

2 votes

2 answers

3k views

How to get python to load right library (a .dylib, not .so.3 on OSX)

I'm using the extractor module in python 2.7 via pip install extractor. I'm on OS X using homebrew, and I have previously run homebrew install libextractor. This creates files with extensions .a ...

FrobberOfBits

17.8k

asked Mar 18, 2014 at 15:28

1 vote

2 answers

4k views

Wikipedia extractor problem ValueError: cannot find context for 'fork'

My aim is to get plain text (without links, tags, parameters and other trash, only articles text) from wikipedia xml dumps (https://dumps.wikimedia.org/backup-index.html). I found WikiExtractor python ...

Shurup

11

asked Nov 22, 2021 at 18:11

0 votes

0 answers

14 views

Why can't my regex pick up the phone #s on the web page? [duplicate]

Hey guys so I am building a phone and email extractor using python regex and while it works for the emails, it won't work for the phone numbers. The code for finding phone number matches on the ...

Marcelino Velasquez

87

asked Feb 28, 2020 at 13:54

0 votes

1 answer

192 views

How can resolve recursion depth exceeded (Goose-extractor)

I am one problem with goose-extractor This is my code: for resultado in soup.find_all('a', href=True,text=re.compile(llave)): url = resultado['href'] article = g.extract(url=url) ...

papabomay

205

asked Apr 1, 2015 at 23:58

Collectives™ on Stack Overflow

All Questions