Intro to Cell, Part 8

June 22nd, 2009

Hello again,

Time for another tutorial update. Some of you have noticed the frame-buffer code in lib-ppu.[ch], and now it’s time to cover it. This doesn’t go into a hell of a lot of detail partially because I didn’t write it … and also because I didn’t spent a lot of time on it, but I do adapt the mandelbrot generator to use it. It isn’t very robust (in the face of application failure you may need to login remotely and reboot) and has some bugs; but this is only intended to try out code, not as a production interface!

Now, this more-or-less comes to the end of the tutorials I’ve completed so far. I had modified the code included here to create an animated zoomer, but it wasn’t really worth writing up (an exercise for the reader perhaps?).

I am still working on an IPC chapter, but progress has been slow due to other distractions, so I can not be sure when it will be complete. Once that is up, I will probably upload a pdf and the texiweb source somewhere, after another proof-read and a few fixes and whatnot.

  - !Z

Read the rest of this entry »

Competition in Size? Prime Numbers

June 17th, 2009

Hi,

it has been a while since i’ve used assembly language. Somewhere in 1996 there was a little competition. The goal of this competition was to create the smallest executable that would print all prime numbers between 1 and 1000. This was all on a DOS prompt and at the end i managed to produce an executable of only 41 bytes which performed the given task.

Read the rest of this entry »

Intro to Cell, Part 7

June 15th, 2009

And now for the penultimate and ultimate versions of the Mandelbrot Set generator. Yes, you get two for the price of one this week.

I just try out a couple of different optimisation strategies and see what the results are. And you’ll see it’s definitely worth it.

Next week I’ll look at graphical output.

  ! Z

Read the rest of this entry »

Intro to Cell, Part 6

June 10th, 2009

Ok, time for another chapter. No code this time – I delve into the depths of assembly language and timing information and using the spu_timing tool.

It meanders a bit since I decided to look at the output of a whole function – including all the ugly set-up code – whereas you’d really only be interested in the time-critical parts. I guess I thought it was worth showing how messy C can translate to such a nice instruction set as the SPU has.

Enjoy

  - !Z

Read the rest of this entry »

Intro to Cell, Part 5

June 2nd, 2009

Here’s the next chapter. It’s just been sitting on my hdd but i’ve been busy hacking on some non-cell stuff so I never got around to posting it.

Oops. Pity, this is where things start to get interesting.! I also discuss some unexpected results I get from issues you might encounter with this type of micro-benchmark.

Enjoy.

   – !Z

Read the rest of this entry »

Intro to Cell, Part 4

April 23rd, 2009

Hello again,

A day late, but here’s the next chapter in the Mandelbrot saga. It’s time to introduce the atomic unit, and talk about SPU-directed load balancing.

The atomic unit is a very nifty device that lets you efficiently implement all sorts of synchronisation primitives in a multi-processor environment. Also, as here, it is quite simple to use it directly to implement some synchronisation algorithms.

Having the SPUs do their own load balancing and job scheduling would often be preferable since they know when they’re idle, they’re faster, and there’s more of them.

For reference, see Section 20.3 of the Cell BE Handbook (on page 585). Section 20.2 also covers the PPU side of things, although it is far less useful.

Read the rest of this entry »

New spumedia project

April 17th, 2009

Kristian Jerpetjøn (aka Unsolo) has just announced a re-start of the old spu-medialib project called spumedia.

I just kicked of a new project of spu-medialib called spumedia where we will start again from the get-go and hopefully do things way better now that we have some experience in how to do things.

The main goal is to accellerate the following already known packages and probably more using a unified library

EXA,
Xv,
SDL,
mplayer,
ffmpeg

Go check out the project on google code.

Intro to Cell, Part 3

April 16th, 2009

Hello again. After the last chapter, here’s some more Cell meatiness. It introduces multi-processor parallelism to the problem and touches on some of the issues that arise from this approach. This is a pretty light-weight chapter but the next one will delve back into Cell specific features.

  - !Z

4 Multiple SPU, SISD

The first obvious way to increase performance when you have multiple
processors is to use more than one of them at a time. Although the
previous example used two processors, the primary processor merely set
the other one up and had it do all the work.

Since we have more processors, lets look at splitting the work amongst
them all.

4.1 Splitting a task

The difficulty is often in how to split the task. Since every point
in the Mandelbrot set is effectively calculated individually (ignoring
mirror symmetry) splitting up the task appears almost trivial. It was
chosen for this series for exactly that reason.

Read the rest of this entry »

Intro to Cell, Part 2

April 9th, 2009

Since the last one was pretty unexciting and not very Cell related, here’s Chapter 3 (part II) .

This moves the simple C code implementation to execute on a single SPU and discusses some background and important issues that come up as part of the process. It jumps straight in with double-buffered DMA as well since it is so easy to do.

It uses the Cell programming tools that come with YellowDog 6.0, which provides libspe 2.1. I do not know if there have been incompatible changes in newer tools.

As a reference, the Cell BE Programming Handbook, Chapter 3 covers the SPEs and Chapter 19 covers DMA in considerable detail.

  - !Z

As with the previous chapter, some of the internal links reference other parts of a larger document … so they will not work here.

The writing needs some work but should hopefully be readable enough. I use too many commas, often, in the wrong place. This is still a draft document so work is on-going.

3 Simple SPU Conversion

Ok, so from the previous chapter, we have a plain old bit of C that
runs ok but doesn’t set the world on fire. What did Sony, IBM and
Toshiba spend those billions on afterall?

3.1 Introducing The SPU

Well as you know, apart from each CBE processor having a dual-threaded 64 bit Power processor — the PPU — which we have been using until now, it also includes 7 “Synergistic Procesor Units”, or SPUs. These are where the real processing power is on the CBE.

Read the rest of this entry »

Intro to Cell, Part 1

April 6th, 2009

Hello everyone!

As mentioned in an earlier post, here’s the first part of a new series on Cell B.E. programming targeted at free software programmers. It is intended for those who want to hack software on this interesting architecture from their GNU/Linux “OtherOS” installation on their PS3, although much of it will apply to other hardware. It will focus on low-level ideas introduced through a process of shared discovery; we’re all learning from scratch here.

The only software required will be packages that come with any PS3 targeted distribution; spu-gcc, ppu-gcc, libspe2-devel. We’ll also use some utilities from the IBM SDK/Developeworks, but they are not necessary to build anything.

The basis of all examples will be a Mandelbrot Set fractal generator. This example was chosen because it is a very simple yet computationally intensive problem that is completely parallelisable and produces graphical output. This allows us to focus on various concepts and technologies independently as we increase the sophistication of the application. We’ll start with basic portable (and boring) C, convert it to run on 1 SPU, then multiple SPUs and so on. As we develop the series various CBE related technology and ideas are introduced where appropriate, hopefully providing a gentle introduction to cell hacking.

The source document is being authored using TexiWEB – a literate programming system based on Texinfo. The complete source code of the document, PDF versions and all files will be available at a later date, licensed under GFDL.

So without further delay, here’s the first part of the series … (which … yes … starts at chapter 2, the printed copy will include it’s own introduction).

  – !Z

Read the rest of this entry »