java - Why do Hadoop jobs need so many threads? -

August 15, 2012

My understanding of Hadoop is that equality on each enumeration node is obtained by starting different jvms for each core.

I see that every JVM has dozens of threads, which can lead to thousands of threads per node. I do not want to think about any reason that so many threads are born. what's going on?

For example, here's a simple pig script that parses and filters some jens:

  / * * Get Tweets with GPS / $ JAR Register Do; Use Json_eb = LOAD '$ IN_DIRS' as com.twitter.elephantbird.pig.load.JsonLoader ('- nestedload') (json: map []); - With Twitter's Library - PRS Jason Parsed0 = Generate FOREACH json_eb STRSPLIT (Jason # 'id', ':'). Tweet as $ 2: Charra, STRSPLIT (Jason # 'Actor' # 'Id', ':'). UserId as $ 2: Charra, JSN # 'Posttime' AS Posttime: Charra, JSN # 'GO' # 'Coordinator' AS GPS: Chararer; Parsed1 = FILTER parsed0 BY (not GPS); PigStorage (); '$ OUT_DIR' in Store Parsed1;

I run this script and the mapped users start 33 processes on my node (I have 32 cores):

  rfcompton @ Node19 ~ & gt; Ps -u Mapper | GRP-V PID | Search the top:    PID user PR ni virit SRR SR CPU CPU% MEM time + command 484 mpad 39 16 1576 m 362 m18 ms 130.8 0.3 0: 09.48 Java 32427 mapped 34 16 1664 m 36 9 m18 m s 122.2 0.3 0: 08.67 go 32694 mapped 36 16 1502 m 23 9 m18 ms s 115.6 0.2 0: 07.94 java 32218 mph 33 16 164 401 m 18 ms 114.6 0.3 0 : 10.29 Java .. jvms has approximately 40 threads each:    rfcompton @ node19 ~> Cat / proc / 484 / position | Grep Threads Threads: 43  
  All together, there are one thousand threads on the 32-core node in Mapread: 
   rfcompton @ node19 ~> Ps -u Mapper | GRP-V PID | Awk '{system ("cat / proc /" $ 1 "/ status")}' | GRP Threads | After reading the relevant section in the (HDOP - Sustainable Guide) it has been suggested that "edu" {SUM + = $ 2} END {print SUM} '1655  
   edited Do:  From Paul's answer, it seems that there are 40 threads that I should expect. They are available to serve the production of maps on HTTPS post-production stages. 
   The partition's output file is made available for reducers on HTTP. The number of labor threads used to serve file segmentation is controlled by the work tracker. Http.threads Property - This setting is per workspace, not a map of multiple work slots can increase the default requirement of 40 for large groups running large numbers in large numbers.

  
  All the adjacent implementations I've seen a lot of abundance In fact, The tasks that take up the tasks that are carried out in the management process are overlapped, such as map jobs and work reduces themselves. 
  On examining "Hadop - The Definition Guide", the authors mention several processes that are multi-threaded. These include 
   Radoser has a small pool of "copyier" thread to bring map output to Paralel. 
  Mappers can be multi-threaded (multithreaddemapers) themselves 
  Datanodes contain data for copying and closing HDFS threads. 
 
  Depending on how your cluster is configured, you can get DataNodes and TaskTrackers on the same machine, and this many threads. 
  I thought that there is significant performance benefits in the heavy use of concurrency, and that is why the implementers have gone that route.




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




ios - How do I use CFArrayRef in Swift? -






July 15, 2011








    I am using Objective-class in my Swift Project through a bridging header. The method signature looks like this:    - Some cement (some type) some parameters;    I started by getting an example of class, calling method, and storing the value:    var myInstance = MyClassWithThatMethod (); Var cfArr = myInstance.someMethod (some value);    Then try to get the value in the array:    var valueInArrayThatIWant = CFArrayGetValueAtIndex (cfArr, 0);    However, I get the error  unmanaged & lt; Cfarray & gt; Not like 'CFArray' .  Unmanaged & lt; Cfarray & gt;  also means?   I looked, but I do not need to change the array in a fast array (though it would be good). I need to be able to get value from the array.   I also tried  CFArray  method to pass in a function:    func doSomeStuffOnArray (myArray: NSArray) {}    Although I get the same error when using:    doSomeStuffOnArray (cfArr); // unmanaged & lt; CFArray & gt; I am using  CFArray  because I want to sto...





Read more





eclipse plugin - Run java code error: Workspace is closed -






July 15, 2012








    To create an automated project, I created a plug-in project with the following dependencies:    Org.eclipse.core.resources   org.eclipse.equinox.registry   org.eclipse.core.runtime    and the following The Java class is located in the src folder:    Package Examiner; Import org.eclipse.core.resources.IProject; Import org.eclipse.core.resources.IWorkspaceRoot; Import org.eclipse.core.resources.ResourcesPlugin; Import org.eclipse.core.runtime.CoreException; Import org.eclipse.core.runtime.IProgressMonitor; Import org.eclipse.core.runtime.NullProgressMonitor; Public class tes {public static zero main (string [] args) {// TODO auto generated method stub IProgressMonitor progress monitor = new NullProgressMonitor (); IWorkspaceRoot root = ResourcesPlugin.getWorkspace (). GetRoot (); Ipoject project = root.jetproject ("desired projectname"); Try {Project.create (progress monitor); Project.open (progressMonitor); } Grip (CoreException E) {// TODO Auto generated blocking block e....





Read more





scala - Play Framework - how to bind form to a session field -






March 15, 2011








    Is there any way, I can get some parameters from the header, cookies (log in userId in my case) , And then apply it in a form which I know who will deposit the ticket?   SupportForm  supportForm: form [supportTicket] = form (mapping ("question" -> text, "priority" -> text) (apply support ticket. (HelpText.update)    What are the good practices here? What is the call to apply the request, when I can use it (and also a good practice?)   Edit An issue, absolutely deceiving anyone if I were to create a hidden area with this value. It could Ript, but the issue may be re-used in any way to verify and return the form, it can not be sure how it ....      





Read more

Search This Blog

LAva

java - Why do Hadoop jobs need so many threads? -

Comments

Post a Comment

Popular posts from this blog

ios - How do I use CFArrayRef in Swift? -

eclipse plugin - Run java code error: Workspace is closed -

scala - Play Framework - how to bind form to a session field -