Simple question about (Xeon Phi) card extension

aazue · ‎10-19-2012

Hi
I have just an small question that is concrete
when you have this card is hosted in motherboard and his module started and ready
could you give what is showed on screen with command
cat /proc/cpuinfo

I can read on the exchange (processor or coprocessor?)
answer of James Reinders (Intel)

(The difference between a processor and coprocessor is this: a processor does not require
another compute device to be present in a working system.
A coprocessor requires a processor be in the system in addition to the coprocessor.)

I agree,but in this voice i can also understand that existing similar tree with the processor
physical processor on the socket
.. sub physical cores
.. sub logical cores

Sorry if I submit this question, the process that is given in your files documented is very well but
not correspond with how we want using.
More precise, I envisage to use bridge process (Tun driver, bridging driver) on an network card that
have multiple pseudo MAC (dummy),and drive dynamically the charges heterogeneous linked on each
pseudo MAC address , require just that it's linked core precise on the IP address specific as index.
An process very simple and summary that I use already (internal or external) without this
Phi card added on machine.

Regards

TimP · ‎10-19-2012

/proc/cpuinfo tells you nothing about whether an Intel(r) Xeon Phi(tm) co-processor is present or active. lspci tells you whether it is basically powered on. Only the additional utilities in the mpss installation give useful information.

aazue · ‎10-19-2012

Hi lspci will answer always ,hardware are present ready or not to work on an machine. you teaches me nothing Maybe if you add what could answer lsmod... more constructive. This card have an data sheet of the protocol of implementation ? I not working with software imposed where I have not complete control. Maybe also, more elementary, you must learn how to respect the formulas of politeness conventional on your exchanges Regards

robert-reed · ‎11-10-2012

You can run an lsmod (I just did to check) but there's not much more it will tell you: I get, among all the unrelated modules, one line describing the "mic" module and its size. Running lspci at least labels the coprocessor device as such, and when the mpss service is started there are protocols started that enable ssh and you can actually run the card and see /proc/cpuinfo there. I won't try to copy the result of that command to this post--it's very large and may contain some details that are embargoed until the imminent public announcement, but it all works and looks just like any other Linux machine but with a whole lotta processors. Also, I don't understand what offense you took from Tim's reply, which seems to me to be rather mild.

aazue · ‎11-11-2012

Hi About Tim I agree that my answer show some aggression. but nothing against Tim personally, it's the general attitude your group with public community developer that annoys me. It seem that here as the restaurant that give you the menu to choose , but it not exist service true for eat, all the food is reserved only for his cooks. About my question: I have any more need this information, I have choosing to use an other solution that could be most modern and promising for future. I can't risk to wasting my time with your approach management of development (side programming) that appears me doomed to produce one more time an parachute in leaded. Regards

robert-reed · ‎11-21-2012

I certainly sense your frustration but I'm having some trouble understanding the nature of that frustration or the particular development management issues that you find a waste of your time. We think we have some pretty good solutions here but if you think some other approach is better, it is certainly your choice to take that other path. But the language that separates us also frustrates us as we try to grasp your meaning.

aazue · ‎11-23-2012

Hi About my frustration that you suspect existing, do not worry it does not exist.. For me your card rest only an object singular between all others without real importance. About the sens of my words that you seems not interpret clearly. I does not want invest the time of programming on an object, i think , the manner you want lead his functionality has no value to me , or rather for my customers. I think the technical approach functional you are trying to implement is not in adequacy with new technology requested on the current market . If you sell this card blank with his programing is in to the charge of the customer it's, may be, better I believe. Regards

robert-reed · ‎11-26-2012

"If you sell this card blank": I suppose we could have chosen to provide the coprocessor without any programming, a blank piece of hardware where ALL the software is the customer's responsibility. I don't think it would be accepted as well as what we did provide, a platform with a full Linux image, capable of running applications either standalone or in one of several hybrid modes including MPI and what we call "offload programming," makes it much easier for our customers to run their programming on the coprocssor.. It is true that this is not a clone of the GPGPU-style architectures favored by our competitors. We think our approach, which does require programmers to consider caches and issues with blocking computation within those caches, will provide better solutions on both host and coprocessor and be much more energy-efficient as well. If you invest more time in understanding our architecture, hopefully you will come to the same conclusion.

aazue · ‎12-09-2012

Hi For confirm the value of your arguments, Could you compile sources Postrgesql-9.2.2 database on two way for give the difference. After ./configure and make you execute command : time make ckeck First with card operational on machine . Second without card on machine ( Only gcc GNU compiler) This test execute the tasks in parallel and could confirm value of your arguments more concretely. You have xvidcap packages for create and permit to show the video of the two tests. Video is better for showing each step of 131 tests This test will demonstrate easily the real value of your card that is only programmable by your services , from the customers. About your supposition GPU side: I don't use GPU and I don't envisage to use GPU but you have some test here: http://wiki.postgresql.org/wiki/PGStrom Regards

aazue · ‎12-17-2012

Hi I have some customer that waiting results of the test I see that you not answer me... For make this test require less that 15 minutes, the time included download sources. Without answer ,the customers (frustrated for retake your words), could think your product that hosted your Linux (house) imposed is not really effective as you suppose. Regards

robert-reed · ‎01-11-2013

You talkin' to me? Sorry, I was on vacation for most of the month of December. I have only just seen your replies.

I also think that you might not understand the basic nature of this device. It is not an "accelerator." It is a coprocessor. It will not speed the operations occuring on its host processor by its mere presence. You can get the coprocessor to participate in host-side computations through the explicit modification of your source code to take advantage of the "offload" extensions available in the Intel C++ and Fortran compilers, for parts of the computation that are highly parallel and explicitly identified. Without those code changes to marshal data to the coprocessor and explicitly call functions compiled for both host and coprocessor, I would not expect to see any changes in performance. These code changes appear as pragmas or directives, as with OpenMP, easier than the changes necessary, say, to port to a GPU but of a similar nature, to draw together similar computational work. Or you can crosscompile programs on the host and run them natively on the coprocessor, whose main features are lots of cores, each with extra-wide vector units for high numerical throughput. Programs requiring configuration utilities such as ./configure can't generally be crosscompiled, but our cluster teams have been able to natively compile many packages on the coprocessor directly. If the 131 tests you describe are purely parallel loads, you might see a difference between running on the host and running natively on the coprocessor, but understand that many programs have a large serial component, which generally will run much faster on the big-core host than on one of the small-core coprocessor threads. So then it would likely turn into a test of the serial/parallel balance in the 131 tests.

Despite all that, I could make some effort to come as close to your proposed tests as I could given adequate time. Unfortunately, my boss has other plans, so I don't think I can accept your challenge. And given that you request video confirmation of the results, in line with your previous skepticism, I doubt I would get much satisfaction from you for the effort. I can only suggest that you learn more from press reports and such as they become available. Hold onto your skepticism but also try to keep an open mind. Perhaps someday soon these devices will become ubiquitous enough that you may get a chance to play with one directly.

aazue · ‎01-18-2013

Hi
I think it's you or your team that are closed in approach of this card.
You want impose system stored in the NOR closed but i see that you are not even able to improve only an database engine (wrote complete in C/C   with access backend at low level)...
I have almost 30 years of programming C/C   and I have perfectly understand how this card work.

You have add an ton of literature but where have you really an innovation.
If I resume your tools proposed:
Intel compiler is a clone of GNU compiler ,it not standalone and it depend entirely to him.
Mpi is clone of Mpich2
OpenMp is available to public with compiler GNU.
the majority of all tool utility that you use is native of community GNU.
And for add closure , you have also excluded Debian or his child Ubuntu ..

What I know ,If we can install our system personal in the (Nor) of this card ,us ,we can improve largely the results of this database engine ,with new programing added ,and no the literature only.
I think that only the side of hardware of this card have an value of interest for we.
Your system and your softwares imposed ,have not interest
We are able to make largely   better that you and more appropriated for our customers...

I agree,I provokes you an little but ,I do not seek the devaluation of your card,I am the first to defend the Intel hardware that is my tool work.

I am not ready to open the free or easy way on an new potential market of new servers low consumption to the concurrents ..
I prefer to stay on my guard ,If I remember... I've already seen your exploits when you have manage the mobile sided with Linux.

When i find time I could add some videos for that you understand better if I am Intel side or not.
Maybe ,also at the same time, you and your team that manage this card you take an lesson programming Unix, Linux
system.
(I can add video with source sample OpenMp used with your Intel compiler and the gnu compiler, you compare the times...)

I forget ...
About the test video i have proposed it's only for see all step, it's not for control times certified exact , me ,I trust to your answer even without videos.

Regards

Complement following added.
For start I add one first video (make check Postgresql) 1 fork and 20 forks
you observe difference with increase size of asynchronous group tests)

(Sorry it's very small machine 2 cores, it's only (personal) that I have find free in my hands),
Debian 7 have not by default videocap in software database;
i have also compile several external sources library for he work correctly..
I have several big machines of customer but i can not compile over (some library's are stage beta)
(also for license Intel compiler (no commercial) that i want use on future videos..

I have using (swf format) that I have tested operational on W8. I can make (avi or other )
but same an badger I have add (codec protected) in (flag in the ./configure of some library's ),
require now VLC is installed to it work correctly on W8...

I hope it work to all, I am not really very experimented in the branch of the video.
Confirm to me please if this first small test video is working correctly for you.

I have add (ogg) for it work on browser, but I think I must increase frame/ second.
also I have find 3 old servers XEON at the trash for mount
a cluster 24 processors, but I don't know if he are all complete ...

After having made the tests on format (ogg)(H.T.M.L.5)
I see that Only Firefox and Chrome may open my video directly in the browser. (Linux and Microsoft)
I'll do all the other videos with (ogg) I think this is the best compromise
(Require free time for search for better solution ...)
(ffmpeg -h full) show a plethora options ....
he can scroll the size of one or two rolls of toilet (still nine,complete) ,
and it's not yet ended...

I have add new video to you
Currently i am very busy an other side i add explication more later.
Regards

robert-reed · ‎01-21-2013

Thank you for all the effort you have made to communicate with me. I read through your latest note several times, and watched the videos (well, I watched the small one and then watched the first part of the long one but I found it mostly incomprensible manipulations of the user interface). I fear that I am only understanding about 30% of what you are trying to say. The short video appears to demonstrate what you've asked me to run on the coprocessor, but looking at that output, the tests look like they are measuring Conformance, rather than Performance. I have resisted taking the time to duplicate the experiment because I did not want to give a false impression of the performance of an application which has not been tuned for this architecture. I am less concerned about running a conformance test, though I still have the issue of finding some time to do the work.

However, I was not able to unpack your bzip2 archive, getting instead errors that started out like this:

$ tar xjf ~/Downloads/test-0.swf.bz2
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Archive base-256 value is out of off_t range
tar: Archive contains `\301\200\r\006\001\260\003\002' where numeric mode_t value expected
tar: Archive base-256 value is out of time_t range
tar: Archive contains `\030@\006\001\250\030\0\320' where numeric uid_t value expected

You ask whether our coprocessor is "able to improve" a "database engine." I am not sure that it would speed up such an operation. This coprocessor is really intended to accelerate numeric operations in its vector units, with just enough CPU associated with each of those vector units to handle control logic and marshal data to feed those vector ALUs. Object-oriented database management does not seem to me to be an application that uses a lot of vectors. I imagine that such an application would do a lot of individual object fetching and field extraction, and while the presence of hundreds of threads might mean that lots of such object manipulations could happen at the same time, the simplicity of the cores would likely not provide the expected speed increase. Even worse, that many simultaneous threads would probably not be well tuned for the thread locks and critical sections that must exist in postgres to provide thread-safe multi-core operation.

aazue · ‎01-22-2013

Hi
That i want you understand in video it's the new network programing is mounted now interfaced (asynchronous wrapped on service 80...)
I understand that is difficult for you that you understand this video when i observe the prehistoric way that
you propose using SSH with command line...
Me with an simple browser i can control all system
when I am on W8 Android Aix or Apple system, from the home or from outside,your card could represent for me only an part
insignificant between all the system complete
Me i know when i make system Linux I don't impose the downgrade for it's persisted aligned kernel on the NOR...
when require upgrade on the external repositories.
As I see that you are not able to understand derivations networks on several different database engines
I move for you the source sample OpenMp used in the video.
Maybe you could make time test,with and without your card with GNU and ICC
the Source is very easy ,you can modify easily better appropriated to your card
(I have wrote long time ago already for my evaluation OpenMp compared with low level i use)
Maybe same you could be able to demonstrate one result is concrete and otherwise that literature ...
I have other samples more complex and more interesting but require it's linked with engine database.
I have not time to change using (mmap) or (ndbm) to it would work standalone ...
We(us ?) side Unix or linux we using mainly fork (semaphore fifo ipc..) ,maybe too conservator on this side probably.
For Postgresql do not worry we are already able to optimize largely with the new hardware Intel and this
without you card obligatory used....
I council to you , you read the source Postgresql before you add your speculation are very very doubted
and I chew my words for the term used...
Regards

(rename file .cc your site have not this (mime) )

I have forget...
About (tar) that you seem want use (tar xjf ~/Downloads/test-0.swf.bz2)
it's not .tar archive file ...
you must use only bzip2 -d test-0.swf.bz2 he result test-0.swf uncompressed
maybe ,more justly appropriated I think ....

aazue · ‎02-12-2013

Hi

About the derivations that use several engines database
Your reasoning is in the wrong side..
Open this link and read all the command enumerated. (Example PQsendQueryPrepared)
(http://www.postgresql.org/docs/9.2/static/libpq-async.html)
For resume your expression.... ( If you invest more time in understanding ... hopefully you will come to the same conclusion.)
Even with an single Intel Atom N570 (4 cores) connected on network to help ,you can have performance improved ..
It suffice just you are able to it programming correctly on an (backend) C/C++ ....

I forget ...
About your remark (Thank you for all the effort you have made to communicate with me.)
My communication effort is not specific to you, rather, more specially to your group.

Regards