python - scrapy selector string won't accept international characters -

May 15, 2011

I'm trying to get a Scrap Spider to crawl a website, but I want the item's The elements I need are written in Spanish, using a tone with a tilde (í).

titulo = title.select (u '.// ["Titulo Origin:"] / text (). Extracts ()

I have found similar issues here but their The answers accepted for me have not been able to work for me.

Adding u at the beginning of the string takes care of some problems but gives me error

  Unicode encoded error : 'ASCI' codec character U can not encode '\ xed' in position 21: serial number is not in (128)

I am here ... ... / I am suggesting using the '('. ') Decode (' UTF-8 '), but by doing this or using the .encode (' utf-8 ') gives me an error

 < Code> exceptions.ValueError: All strings must be XML compliant: Unicode or ASCII, no zero bytes or control characters

Am I missing something or some other way or do I have to Better than Ga regex to catch every other part of my string, but that letter?

Even so I have the code so far:

  DEF parse (self-response): #change a HtmlResponse response to allow for UTF -8 encoding body. Feedback = HtmlResponse (url = response.url, status = response.status, headers = response.headers, body = response.body) Print '\ n \ n Response encoding', response.encoding ## Page encoded in UTF-8 Hxs = HtmlXPathSelector (response) title = hxs.select ('// div [@ class = "datosespectaculo"]) Item = [] Title for the title: item = CarteleraItem () titulo = title.select (u'. /) / ["Original title:"] / text () ' Simply put '(utf-8')). Remove () Ano = Title  
 
 
 
 
 
 
 
 
 
 
  P> 
 
  & lt; Div id = "contgeneral" & gt; & Lt; Div class = "contyrasca" & gt; & Lt; Div id = "contfix" & gt; & Lt; Div class = "contespectaculo" & gt; & Lt; Div class = "callyzack" & gt; & Lt; Div itemscope item type = "http://schema.org/Movie" & gt; & Lt; H1 class = "titulo" itemprop = "name" & gt; 15.361 & lt; / H1> & Lt; Img class = "fef "src =" http://www.cartelera.com.uy/imagenes_espectaculos/musicdetail13/14770.jpg "/> & Lt; Div class = "datosespectaculo" & gt; & Lt; Strong & gt; Original title: & lt; / Strong> & Lt; Em> 15.361 & lt; / Em> & Lt; Br / & gt; & Lt; Strong & gt; Eno: & lt; / Strong> & Lt; Span itemprop = "copyright year" & gt; 2014 & lt; / Span & gt; & Lt; Br / & gt; & Lt; Strong & gt; Gereno: & lt; / Strong> & Lt; Span itemprop = "genre" & gt; Comedy / Drama & lt; / Span & gt; & Lt; Br / & gt; & Lt; Strong & gt; Horror: & lt; / Strong> & Lt; Span itemprop = "duration" & gt; 60 & amp; Nbsp; & Lt; / Span & gt; & Lt; Br / & gt; & Lt; Strong & gt; CalifCian: & lt; / Strong> +18 años & lt; Br / & gt;  # - * - Coding: UTF-8 - * -

  
  / code > Not working, you can use a Unicode string where non-ASCIII characters use the  \ u  escape sequence. 
  Then you become the XPath selector: 
   titulo = title.select (u '.// ["t \ u00edtulo origin:"] / text ()' Encounter ('utf-8'). Extract ()  
  I usually use a simple Python shell session to check the escape sequence: 
   paul @ wheezy: ~ $ Python Python 2.7.3 (Default, 2 January 2013, 13: 56:14) [GCC 4.7.2] For more information on Linux 2, type "help", "copyright", "credit" or "license" & Gt; & Gt; & Gt; U '. ["Titulo Origin:"] / Text ()' U '.// ["T \ xedtulo Origin:"] / Text ()' & gt; & Gt; & Gt; U '.// ["T \ u00edtulo origin:"] / text ()' U ' ["T \ xedtulo origin:"] / text () '& gt; & Gt; & Gt;




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




eclipse plugin - Run java code error: Workspace is closed -






July 15, 2012








    To create an automated project, I created a plug-in project with the following dependencies:    Org.eclipse.core.resources   org.eclipse.equinox.registry   org.eclipse.core.runtime    and the following The Java class is located in the src folder:    Package Examiner; Import org.eclipse.core.resources.IProject; Import org.eclipse.core.resources.IWorkspaceRoot; Import org.eclipse.core.resources.ResourcesPlugin; Import org.eclipse.core.runtime.CoreException; Import org.eclipse.core.runtime.IProgressMonitor; Import org.eclipse.core.runtime.NullProgressMonitor; Public class tes {public static zero main (string [] args) {// TODO auto generated method stub IProgressMonitor progress monitor = new NullProgressMonitor (); IWorkspaceRoot root = ResourcesPlugin.getWorkspace (). GetRoot (); Ipoject project = root.jetproject ("desired projectname"); Try {Project.create (progress monitor); Project.open (progressMonitor); } Grip (CoreException E) {// TODO Auto generated blocking block e....





Read more





ios - How do I use CFArrayRef in Swift? -






July 15, 2011








    I am using Objective-class in my Swift Project through a bridging header. The method signature looks like this:    - Some cement (some type) some parameters;    I started by getting an example of class, calling method, and storing the value:    var myInstance = MyClassWithThatMethod (); Var cfArr = myInstance.someMethod (some value);    Then try to get the value in the array:    var valueInArrayThatIWant = CFArrayGetValueAtIndex (cfArr, 0);    However, I get the error  unmanaged & lt; Cfarray & gt; Not like 'CFArray' .  Unmanaged & lt; Cfarray & gt;  also means?   I looked, but I do not need to change the array in a fast array (though it would be good). I need to be able to get value from the array.   I also tried  CFArray  method to pass in a function:    func doSomeStuffOnArray (myArray: NSArray) {}    Although I get the same error when using:    doSomeStuffOnArray (cfArr); // unmanaged & lt; CFArray & gt; I am using  CFArray  because I want to sto...





Read more





scala - Play Framework - how to bind form to a session field -






March 15, 2011








    Is there any way, I can get some parameters from the header, cookies (log in userId in my case) , And then apply it in a form which I know who will deposit the ticket?   SupportForm  supportForm: form [supportTicket] = form (mapping ("question" -> text, "priority" -> text) (apply support ticket. (HelpText.update)    What are the good practices here? What is the call to apply the request, when I can use it (and also a good practice?)   Edit An issue, absolutely deceiving anyone if I were to create a hidden area with this value. It could Ript, but the issue may be re-used in any way to verify and return the form, it can not be sure how it ....      





Read more

Search This Blog

LAva

python - scrapy selector string won't accept international characters -

Comments

Post a Comment

Popular posts from this blog

eclipse plugin - Run java code error: Workspace is closed -

ios - How do I use CFArrayRef in Swift? -

scala - Play Framework - how to bind form to a session field -