python - How to write a DownloadHandler for scrapy that makes requests through socksipy? -


I'm trying to use scraper on the tooth, I'm about to write a download handler for Skype connection I'm trying to get

Scrapy's HTTP11DownloadHandler is here:

Here's an example to create a custom download: Handler:

Here's the code to create a Soxiphone connection class:

  class SocksiPyConnection (httplib.HTTPConnection): def __init __ (auto, proxy type, proxydir, proxyport = none, rdns = true, user name = none, password = none, * args, ** kwargs): self.proxyargs = (proxy type, proxydir, proxyport, rdns, username, password) httplib.HTTPConnection .__ init __ ( Auto, * args, ** kwargs) Def Connect (self): self.sock = socks.socksocket () self.sock.setproxy (* self.proxyargs) if isinstance (self.timeout, float): self.sock.settimeout (Self.timeout) self.sock .connect ((self.host, self.port))  

With the complexity of twisted reactors in the scanner code, I do not understand how Plug socksipy in it. Any ideas?

Please do not reply with confidential options or "do not work responsibly" - "I know that I am trying to write a custom downloader" who requests using socksipy is.

I was able to do this work.

After installing a pip, txsocksx , replace ScaryAgent with ScaryAgent > txsocksx.http.SOCKS5Agent needed.

I just copy the code for HTTP11DownloadHandler and ScrapyAgent to scrapy / core / downloader / handlers / http.py Created, sub-classes and this code are written to:

TorProxyDownloadHandler (HTTP11DownloadHandler): def download_request (auto, request, spider): Return "" "Postponed for an HTTP download Tax "" "agent = ScrapyTorAgent (contextFactory = self._contextFactory, pool = Self._pool) return agent.download_request (requested) class ScrapyTorAgent: def _get_agent (self, request, timeout): bindaddress = request.meta. Get ('bindaddress') or self._bindAddress proxy = request.m Eta.get ('proxy') if proxy: _, _, proxyHost, proxyPort, proxyParams = _parse (proxy) scheme = _parse (request.url) [0] omitConnectTunnel = proxyParams.find ('noconnect') & gt; = 0 If the scheme == 'https' and not omitConnectTunnel: proxyConf = (proxyHost, proxyPort, request.headers.get ('proxy-authorization', none)) return self._TunnelingAgent (reactor, proxyConf, contextFactory = self._contextFactory , ConnectTimeout = timeout, bindAddress = bindaddress, pool = self._pool) and: _, _, host, port, proxyParams = _parse (request.url) ProxyEndPoint = TCP4 client endpoint (reactor, proxyhost, proxyport, timeout = timeout, BindAddress = Bindaddress) Agent = SOCKS5Agent (Reactor, proxyEndpoint = proxyEndpoint) Return agent return self._Agent (reactor, contextFactory = self._contextFactory, conne In settings, something like this is required:

  DOWNLOAD_HANDLERS = {'http': ctTimeout = timeout, bindAddress = bindaddress, pool = self._pool)  

'Crawler.http.TorProxyDownloadHandler'}

Proxy with Scrapy now with a socks working as proximity through the proxy.


Comments

Popular posts from this blog

ios - How do I use CFArrayRef in Swift? -

eclipse plugin - Run java code error: Workspace is closed -

c - Error on building source code in VC 6 -