This is an introduction to SQL for people who don't know anything about database programming, a boat I just recently climbed out of, I think.

I wanted to start off my recently approved authorship in Whitenoise with an introduction to SQL, a programming language I've worked with pretty much every day for the last couple years. I graduated from college with only a basic knowledge of how to program in VBA, but my first job heavily involved using SQL to manipulate databases. These are two wildly different languages, so it felt like I was being thrown to the sharks. However, I gradually learned my way around and realized how powerful a tool it is. Any company interested in using and manipulating "big data" (yay buzzwords!) is going to be interested in people who know how to code in SQL, so if you're looking at those types of jobs, but you're not really sure how to prepare yourself for them, hopefully this will give you a better idea.

SQL is a declarative language, which means that instead of telling the program how to do things, I tell it what to do, and let the program worry about figuring out the best way to do it. In this respect, it's different from Java or HTML or Python or any of the more popular languages out there. For example, a declarative language would be able to create a program that grabs every single red item from each room of your house. A procedural language would create a program that goes into each room of your house, looks at every item, identifies whether it's red or not, and grabs it. So while procedural languages are very powerful, SQL is very useful and more efficient in the right contexts, namely working through large data sets.

A single piece of SQL code will be structured according to the following pattern:
- Select ...
- From ...
- Where ...

The "From" piece identifies where you are looking. "Select" identifies what you are grabbing, and "Where" identifies what conditions you are placing on the items you want to grab. As such, only "Select" and "From" are essential to your SQL code. Everything else is just used to further refine everything.

As such, the core of your SQL code is built from your data. Your data comes from your database, and specifically, a schema within the database that houses your data. These are usually created and named by your database administrator, but if there's any rhyme or reason to it, the data sets within a schema will be related. Ideally, you could think of each database as a neighborhood. A schema would be each of the individual houses, and a data contained in the schema would all relate to that house, e.g. "Kitchen", "Brad's Toy Room", "Master Bathroom", etc.

Given your dataset, SQL then identifies the characteristics that you're interested in. A single point of data or an observation can have thousands or millions of variables—humidity, dew point, wind speed, temperature, air pressure, latitude, stadium, arena, make, model, year, etc. If we use your kicthen as our data set though, these variables might be "Name", "Microwave-ability", "Flatness", and "Likelihood of losing finger if wielded by two-year-old". These characteristics will be present for each observation in your dataset, and can be called by SQL according to however you specify.

From this basic set of parameters, we can identify objects in the dataset already. "Give me the microwavability of every item in the kitchen" would look like

  • Select Name, Microwavability
  • From Kitchen.

"Give me name of every item in the kitchen" would look like

  • Select Name
  • From Kitchen

Pretty intuitive right? The next step is to limit what you're looking at.

Without specifying a "Where" condition, you are telling SQL to return the characteristics of every single item in your dataset. By identifying conditions in your "Where" statement, you can focus your attention on only the relevant items. This is where you could specify that you only want to look at the red items, or the plates, or the items with a higher than 90% likelihood of cutting through your finger if wielded by a two-year-old. SQL automatically looks at the characteristics of each observation in your dataset, and only returns those that qualify under your conditions.

For example, I could tell the program that I want to see how microwaveable every red item in the kitchen is. This would look like

  • Select Name, Microwaveability
  • From Kitchen
  • Where Color=Red

I don't even need to see the characteristic I'm selecting on if I don't want to. I can specify multiple conditions, too, such as

  • Select Name, Location, "Is it Bloody?"
  • From Kitchen
  • Where Color = Red and "Likelihood of Slicing Finger Off Cleanly if Wielded by a two-year-old" > 90%

which will tell me what items that are sharp and red are actually bloody, and where they are.

As you can hopefully begin to see, SQL is extremely useful at whittling down large data sets into manageable and palatable chunks. It's not the best tool for compiling raw data (e.g. I would still use Python if I wanted to scrape data from a website or something), but once you have your data, SQL is arguably one of the best tools for parsing it out and getting some utility out of it.

There's quite a bit more to SQL code that I haven't even begun to touch on, but if ya'll think this is useful, I'll talk about Joins and how to look for bloodied finger-cutting implements in your whole house/neighborhood.

For practice and learning on your own, I like sqlzoo.net and w3schools.com/sql

Advertisement