All posts by mcstro

Technical professsional & Boss DJ

Green sky, blue grass

Greensky Bluegrass is a great name for a band, especially if you were trying to start a bluegrass band. The name is unfortunately also taken by a band, they’re called Greensky Bluegrass. In the opinion of some friends (Trace Friends Mucho), they’re “stealing all of our money [sic].” I don’t know what that means, but I can say they do a pretty imaginative version of Pink Floyd’s “Time.”

But enough about this band. Let’s go back to that name. Bluegrass is a genre, right? But grass isn’t blue, it’s green. You know what is blue? The sky:

blueskygreengrass
Blue sky, green grass

It’s fun to contemplate the chiasmus a bit longer than perhaps the band’s originators intended. After awhile, you start daydreaming about what the world would look like with the colors swapped. Well, daydream no longer:

greenskybluegrass
Green sky, blue grass??

 

This is pretty easily done with MATLAB’s image processing toolbox. If you studied applied math or engineering, chances are good that a professor assigned you a problem involving MATLAB, likely without giving any instruction on it. If you don’t have a soul and thus listen to talk radio, you will have noticed the manufacturer, MathWorks, Inc., advertising on your favorite millenial-targeted talk show. But for the uninitiated, MATLAB is a set of programming tools that work extremely well for certain problem domains, but are grossly inadequate for others (though that doesn’t stop people from trying).

Despite its overuse — some might even say abuse — it turns out that MATLAB is particularly well suited for image processing applications. I don’t want to get into it here, but it has to do with the types of data structures that MATLAB is optimized for. The language’s name is a shortened version of MATrix LABoratory, so naturally, the data structure that it does best is a matrix. Given that a matrix is usually just a two dimensional array of numbers, it sort of makes sense that MATLAB would handle images well, since they can be represented as two dimensional arrays of numbers.

Please don't sue me, Randall.
Discussing the in’s and out’s of why MATLAB is inappropriately applied to certain problems would be about as nerdy as an xkcd comic.

 

That’s actually only true of a grey-scale image. A color image is actually 3 two dimensional arrays, one for R, G, and B:

rgbfigs
Blue is the only thing that really needed to change

The technique I used was simple thresholding. This means that I looked at the parts of the image that were the most blue (above a threshold that I set myself) and deleted those values. Then I looked at the parts of the bottom half of the image that were above a (different) threshold for green, and added blue to those parts. Somewhat interestingly, the sky-blue color is actually quite saturated in both green and blue (we’re talking within additive color models, folks), so the recipe is:

  1. Green sky: remove blue from sky
  2. Blue grass: add blue to grass

which surprised me because I thought I was going to be adding green at some point.

In conclusion, you should definitely check out Greensky’s actual music, which turns out to be impressively expansive and sprawling given their traditional bluegrass instruments, and also happens to be quite good. No, they’re not paying me to say this, and no, I haven’t asked. I like to think that I’m tactful about the difficulties of making a living as a professional musician, unlike some others I know (I’m looking at you, Trace Friends Mucho). On the other hand, MathWorks should definitely be paying me for promoting their image processing toolbox. I’m sure they have some pretty deep pockets, what with the large scale racket they’ve built for themselves in the engineering community. I practically already have the jingle written:

violetsareredrosesareblue
Violets are red, roses are blue. MATLAB’s image processing toolbox is fast, efficient, and easy to use.

OH, and if you ever want to jazz up your MATLAB icon, I also have to tools to make the icon look way silly:

logorainbow
Pay me, MathWorks, Inc.

 

Advertisements

Here’s where all the funk music is

Screen Shot 2016-07-19 at 12.56.57 AM

Feel like gettin’ down? Can’t quite figure out how to scratch that itch for the dirty grooves? Let me take you right on up to the Mothership. Do not attempt to adjust your radio. We got some interplanetary extraterrestrial FAWNK for you right here, enough grooves to get you to Alpha Centauri and back, conveniently all in one location.

This post has been a long time comin’.

  1. CRUCIAL ALBUMS
    • Mothership Connection by Parliament. Classic, uncut funk. AMG gives this one five stars.
    • Maggot Brain by Funkadelic. “Mother Earth is pregnant for the third time…” So begins another classic album conceived by funkmaster George Clinton. The solemn guitar solo on the first track eventually gives way to some of P-funk’s most kickin’ grooves.
    • Songs In the Key of Life by Stevie Wonder. Stevie knows a thing or two about life. Pay attention and heed his words. I usually put this album on when life gets tough, and it always helps.
    • Donny Hathaway Live. This sucka met a pretty dark fate, but his soul was immaculate
    • There’s A Riot Goin’ On by Sly & the Family Stone. In an age that was defined by hit singles, Sly et. al. couldn’t help but put together entire concept albums with repeated themes and motifs.
  2. PLAYLISTS I MADE
    • Bmore Fawnk — Made this last March to DJ for a party. Strictly speaking, this isn’t exclusively funk, but it is all awesome.
    • Celestial Soul — Specifically a playlist of the outer-space funk stuff. Obscure Stevie Wonder tunes, Bootsy spacebass, a lot of George Clinton, and just about anything else I could hear echoing through the cosmos
    • F-U-N-K — The original funk playlist
  3. PLAYLISTS OTHER PEOPLE MADE
    • Anthony’s Funk Playlist — I copied this off some loonnng Facebook post made by Anthony Ant, trumpet player and emcee from Oakland, CA. He is very much about live music, so a lot of the tracks he listed noted that they had to be viewed live on YouTube. I did not add these tracks to the playlist. If you want to dig their live show The Glory Band (James Small on drums), go out in Oakland/South Berkeley any time of the week. You’ll see them at e.g. Starry Plough, The Layover, the Legionnaire, opening for PFunk at the Fox Theatre, the BART station, et cetera, ad infinitum. Talented-ass mo’ fockazzz…
    • Stro ‘N B — I debated posting this one because there’s kind of an awkward story behind this one. I got this one from a Boston sports commentator/admin we call AMac. The awkward part about this is that AMac is married to my sister… Yeah, I told you it was awkward… But here’s the thing: Just because he’s married to my sister doesn’t mean he can’t make the most legit 90’s R&B playlist out of anyone I know. Which he can, and has done here. Welcome to family, bro…
    • 90 hrs @ Ben’s Chili Bowl — If you travel much, you’ve surely seen those touch-screen Jukeboxes Of The Future, either in an airport or billiards bar, or in your run-of-the-mill tourist-trap tequila saloon. These cool High-Tech devices stream from the internet, which means they are playing Taylor Swift and Katie Perry for about 98% of their functioning lifetime. Not that there is anything wrong with either of those these two talented individuals… But what ever happened to curated playlists?? As the links in this post should make clear, I am a fan of music streaming services from the perspective of a music consumer (though not necessarily music creator), but I definitely think it’s important to still maintain libraries curated by humans. For this reason, I love Ben’s Chili Bowl on U St. in Washington, D.C. I got most of the albums in their jukebox, except for the Prince album, because, as is obvious to anyone who knows about Prince, Prince is not on Spotify. RIP Prince.

Enjoy this. Also, if you ever want to hire me as a DJ, I have access to 2 kW of sound and a subwoofer. My rates are modest! Send me a message. ALSO, I used Spotify’s Web API to write this post, so if you ever want me to do software development for you, I can do this as well. (My rates for engineering work are somewhat higher than what I charge as a DJ.)

HW 0


1. BIO AND INTERESTS

My name is Mark Strother, and I’m a 4th year physics graduate student here at UC Berkeley. So far, CS267 has already vastly improved my understanding of computation strategies I’ve encountered in my research. The course is titled “Applications of Parallel Computing,” and for me the emphasis is really on “applications.” My research involves physics at the nuclear-subnuclear length scale, or physics at length scales below L \lesssim 10 fm. The physics here can be split roughly into three different groups:

  • Nuclei: The positively charged core of an atom, which is composed of
  • Nucleons: The collective term for protons and neutrons, which themselves are composed of
  • Quarks and gluons: Supposedly* fundamental particles appearing in the theory of quantum chromodynamics (QCD), the accepted theory of strong interactions.

The process of isolating different length scales and studying them individually is known as effective theory, and is related to the idea of the renormalization group, which dictates how a given physical theory changes when you zoom in or out. In an effective theory, it is possible to “bury” ignorance of physics occurring at ultra-short or ultra-long distances into a small set of parameters, giving you control over your theoretical description (re: predictive power). For a reference on these techniques applied in nuclear physics, see [1].

While nuclear physics can be divided this way, it is not always the case that physical processes can be organized in such a hierarchy. For instance, in molecular dynamics there are many different phenomena all occurring on the same length scale. In this case, such a separation of scales is impossible. That’s the good news. The problem is that, even if you can use effective theory for the problem, you still need a few basic input parameters. These parameters might be calculated within another theory, or they could be measured. In nuclear physics, however, it’s often the case that the calculation is impossible and the experimental data is not precise enough.

And this is jumping off point for “OHNOES COMPUTER PLZ HALP”.

While the applications of high-performance computing in nuclear physics abound, I’ve decided to spend the rest of the post discussing one in particular, called lattice QCD. The reason for doing this is because I’ve been fumbling around with lattice QCD software for about a year now and I still am not entirely sure how it works, and this assignment is a great opportunity to see what’s going on.

First of all, what is lattice QCD? Without going into a whole song and dance, lattice QCD is a method for numerically computing predictions of QCD that are otherwise incalculable. The difficulty stems from the fact that the strong force is…I mean it’s strong, right? When an interaction is weak, we can just pretend like we’re looking at a free theory (non-interacting), and then add the interactions as an after-thought. This perhaps sounds a bit implausible (unless you’re a physicist, in which case you’ll recognize it by the name of “perturbation theory”), but it happens to work very well. In fact, it’s about the only way of calculating things without resorting to numerics. Since QCD does not lend itself to perturbative calculations, we’re basically forced into putting on a discretized lattice and calculating everything numerically.

The idea of putting QCD on a lattice has been around since the mid 70s. Like so many fields of research, outsiders have been dissing its results, criticizing its slow progress, and questioning its overall usefulness since the beginning, whilst true believers have been championing it — “just a few more decades!” It has only been in the last 5-10 years that the field has reached maturity. Behold, the hadron spectrum, from a 2008 Science publication [2]:

The hadron spectrum, calculated from LQCD, in comparison with experiment.
The hadron spectrum, calculated from LQCD, in comparison with experiment.

There are a lot of other really interesting things I could talk about here, but I’ll save them for another post. Right now, I am going to talk about a small subset of computational tasks involved in a lattice QCD simulation and call it a night.


2. LATTICE QCD IN PARALLEL

The software I mentioned is called Chroma, and was developed by a collaboration called USQCD, as part of the Department of Energy’s SciDAC initiative (“Scientific Discovery through Advanced Computing”). As of writing this, Chroma consists of over 400,000+ lines of C++ code. It is a high-level interface that sits on top of various libraries that perform all the tasks needed for a lattice computation. The hierarchy is as follows [3]:

USQCD's software hierarchy
USQCD’s software hierarchy

Some of these guys are not relevant from a HPC standpoint, but they appear [4] to cover the variety of programming models discussed in class:

  • QDP++ : Stands for QCD Data Parallel (by the way, Q will always stand for QCD [like the G in GNU, but without the recursion]). QDP++ specifically deals with operations that are implemented at every space-time point on the lattice.
  • QMP : QCD Message Passing, which to me is completely confusing — shouldn’t they have gone with QMPI, to avoid conflating this with multi-threading?
  • QMT : Aha, here’s the multi-threading.
  • QUDA :  QCD implementation of CUDA — lattice calculations also take advantage of general purpose GPUs.

To name a few places where these processes are running, we have

  • NERSC’s Edison and Hopper
  • LLNL’s Oslic and Edge (Edge was retired recently in fact)

I mentioned above that LQCD has reached maturity. There are two main reasons this is happening now. First, it is a highly non-trivial matter to put a theory like QCD on a lattice. Lattice theorists have spent years trying to make sense of various technical issues. For example, the discretization and the finite volume can lead to what are known as “lattice artifacts” — new effects that aren’t present in the continuous, infinite volume theory. Furthermore, it is necessary to perform an extrapolation procedure when interpreting the results of a LQCD simulation, which is not a priori clear how to do. Plenty more examples abound — while some of the difficulties were to be expected, others were a complete surprise.

The second reason LQCD is finally hitting its stride is simply due to the co-development of computing resources. Computers have only recently progressed to the point where is feasible to do a calculation that is not totally overwhelmed by lattice artifacts. The majority of the computing resources go into inverting the “D-slash operator.” It’s a matrix of dimension on the order of \sim 10 \times N^3 \times T, with being the number of spatial sites in one dimension, and T being the time dimension. Clearly, using a 2^4-sized lattice is going to result in some atrocious artifacts, yet the matrix is already past 100 x 100. The state-of-the-art calculations (like [2]) typically use somewhere between 24-64 lattice points per spatial dimension. Such computational demands were exorbitant only until recently (and they still ain’t cheap).

While people can come up with clever algorithms for inverting a matrix, there’s no changing the necessity of doing so, at least for a LQCD calculation. In an ideal world, the vision would be to first calculate properties of hadrons, then extend the programme to calculating hadron-hadron scattering. Indeed, this is an ongoing project, with some results already published by other collaborations [5]. But before we start trying to calculate Pb–>Au on a lattice, I will close this post on sort of a down note — there’s an Amdahl’s Law lurking here. The reason for the bottleneck actually comes from physics. As one begins adding more hadrons on the lattice, the work grows combinatorially, due to the diagrammatic procedure in quantum mechanics known as Wick’s Theorem. For the proton-proton scattering, there are on the order of thousands of diagrams for a given channel. That means thousands of D-slash matrices to invert.

Without getting too technical, what you measure on the lattice are objects called correlators. A correlator has a source and a sink. If you want to measure something like the proton mass, you need three sources and three sinks — three because of the three quarks to get the quantum numbers of the proton correct. What Wick’s Theorem tells you is that you need to sum over all the different ways to connect the various sources to sinks. This is fine for three quark operators, because that’s a total of 3 x 2 x 1 = 6 diagrams to sum over. But two protons? That’s six quark operators, or 6! = 720 diagrams, each with a D-slash matrix to invert. By the time you’re on helium-3, you’re looking at 9! = 362,880. Bottom line: if we are ever going to be able to continue lattice research into many body states, we’re going to need more than the speedup afforded by parallelization!

So, LQCD will have its moment in the sun. It is finally achieving its potential thanks to today’s modern supercomputers. Though it’s somewhat discouraging to know that its reach is finite, I am still very fortunate to be involved in a lattice collaboration while the field is in its prime.

[1] R. J. Furnstahl and K. Hebeler, Rep. Prog. Phys. 76 :126301 (2013)

[2] Durr, S., et al. (Budapest-Marseille-Wuppertal), Science 322, 1224 (2008)

[3] http://usqcd-software.github.io/

[4] R. G. Edward and B. Joo,  Nucl.Phys.Proc.Suppl.140:832 (2005)

[5] NPLQCD Collaboration, Phys. Rev. D81:074506 (2010); CalLat Collaboration (publication in progress)