when we implementing an algorithm, often we are facing a decision, to match the fork-joint flow shape of our algorithm, we can have either one single long thread or many short parallel threads that do the parallel-able computation task on computation resources with varying amount of inter-threads communication.
obviously it can be modelled as a optimization problem, and depend on the nature of our algorithm (or application) and problem size, we will result different code for execution which will minimize the execution time and possibly satisfying certain constrains.
it all sounds we almost got the solution, but if we think about it carefully, we are still facing a big challenge, the memory band width, although most of our computation resources (CPU,GPU,accelerators) have their own memory system for caching data, we still face a challenge of delivering data and instructions to those devices in time.
for example, we have 128 treads running on 32 cores, when the threads are switching, they will likely to cause cache miss and require a main memory access, if one core does it, it should be ok, but if 32 core all accessing the same memory, we will have a network congestion, therefore resulting a reducing parallel performance.
if we think about how our neurons in the brain communicate,this is a very different architecture, first, we have a dynamic physical communication network, and the dynamic connections are evolved by some degree of competition and cooperation, one example is the ion gate on the synapses are varied by how it is used.
but the real different is possibly how memory is structured in our brain, a very good example would be performing calculation on a abacus and in our mind. surely we can do the abacus way much faster than do it in our mind, unless their some quick algorithm for large problems, but the real point of this is, we don’t have much memory (possibly RAM like memory) for the tedious calculation, where our brain is much more capable of doing visual information analysis and muscle control signal generation, and the same time very deep in our brain, a look up table for conditional branches, and I guess that’s may just be a configuration of our neuron connections.
so where is the memory? you may ask, well, I think most our memory is just a neuron network pattern , which is a ROM (read only and take long time to write) like thing but the different is reading it’s info is by using it, which is more like a FPGA LUT net.
so from a architecture point of view, our neuron network in the brain would not be very good at execute the repetitive simple instructions, since we don’t have the right memory structure (RAM) for them, but we seems to be doing much better vision task than the computer which has very few number of computation units and very large amount of RAM, what could be the issue here? again, the real answer for this should be, computer can do certain specific vision task better than human brain, but when you think about a general case (large data set), the human brain will out perform the computer, one answer to this could be the algorithm in our brain are optimised by a long term evolution, where the computer just execute what we think might be happening in our brain in terms of numerical calculation.
but how does it relate to the memory architecture problem? we can see the trend of adding more computing resource on a single chip, but should we try to go towards the brain like structure where dynamically routing the connections of different resources and have millions of them? that perhaps will work if we don’t use digital format for computation and lose the machine like robust properties, but do we really want to do that? I guess that will just denied the purpose of building machines, we want to have a high degree of certainty of what we do at each step, this is just a complementary behaviour to human, and that’s why we need them to be like that.
so if we have decided to go the machine way, what is the problem we need to solve? the network? the memory hierarchy, or the load balancing and scheduling on computation resources? I think all these issue can be solved by a good communication protocol, with a good protocol, we can reduce global communication and help reduce the main memory traffic, we can also make good use of memory hierarchy and automatically solve the resource sharing problem. this is more or less like how human communicate with each other, we have a good protocol that allow us to communication in small groups, large lecture theatre, and one to one talk.
so what’s the secrete of this protocol then, although I am not fully confident with my answer, but I think it’s has a strong link with model predictive control or MPC for short, because in order to optimize our behaviour, we much know as much information of our communication objects as possible and build a model of it, then a dynamic optimization process goes on and we find the current best action for a goal of better resource utilization. obviously this is not a simple scenario when many node in the network is doing MPC, but with more in depth research, I hope we can have more robust framework for this future intelligent communication protocol.
resources
http://www.brains-minds-media.org/current