You Should Know SQL

I like being a “full stack” developer. Having the ability to think up a new feature and personally see it go from just an idea all the way to a finished product gives me not only a great sense of pride but also the ability to work more autonomously. I don’t need to wait for another team to build out an API or for the front end to finalize a UI mockup. I can just sit down and make things happen.

But.

Specialization isn’t a bad thing either. For example: I’m just not great at creating beautiful user interfaces. The ones that I create work well and they’re not necessarily ugly, but there are people on my team who can do them much better than I can. So they do. Duh. It’s hard to argue that the UI person on the team should know the intricacies of an ETL system if their day to day work never needs them to — they should focus on what’s providing the most value for their team.

The biggest place I personally see this going afoul is when it comes to interacting with data. I think that just about everyone should be familiar enough with SQL to probe around in a database without any help. It’s not a hard skill to learn the basics of and will really up your game as a developer.

No Waiting for Information

Whether you’re a product manager or a UI/UX Designer, knowing how to write a query to answer your questions is invaluable and will increase your productivity.

For example: let’s say you’re working on refactoring a portion of an application and you realize that if you make a little tweak user’s experience will be much nicer overall. The only problem is that, for some reason, it’s not backwards compatible. You could just shrug it off and ignore the tweak — after all, you can’t just break something that tens of thousands of people use, right? You could also just ignore the problem and forge ahead with the new tweak — there’s an argument that if it’s easier than before it’ll be used more anyway, so let’s move fast and break things, right? Alternatively, you could make an informed decision by figuring out the exact number of users that would be affected. The quickest way to do that is likely by writing a SQL query to check something in your database. Instead of having to ask someone else for help, or even just punting to your boss, wouldn’t it be great to be able to fetch the information yourself and go from there?

It’s Like Riding A Bike

It’s worth saying again that I’m not saying every developer needs to be able to give a deep dive into PostgreSQL’s internals. Knowing how to write a select statement and how to join tables is enough to figure out the vast majority of things you’ll realistically encounter. The beauty of this is that, for the most part, this basic knowledge is extremely portable.

I can write pretty much the same query for querying a PostgreSQL database, a MySQL database, and CloudWatch logs via Amazon Athena. Just about any system that stores data is going to use a syntax familiar enough that it won’t be a huge hurdle to figure out. The point here is that once you’ve learned enough SQL to be dangerous, you’ve learned enough to take that knowledge and apply it across a myriad of different applications.

Write Better Code

Okay, so maybe you never need to actually delve into any data and you don’t foresee that ever coming up in your life. You should still learn SQL because it will help you write better code. How?

The ability to know that what you’re working on is slow because the underlying query is filtering by columns without an index is a big deal. Instead of either ignoring the problem and having slow code or writing some sort of caching or abstraction to make things seem faster, you can just fix the real problem. The important part is that you knew enough about the database to realize that the problem may lie there.

Knowing SQL helps me understand what the heck my ORM is doing, why it’s doing that, and ways I may be able to improve my code’s performance. For example, let’s say I want to find every Contact (which belongs to a Student) a User has access to based off of the students in that user’s scope.

students = Student.scope_to_user(user_id).map(&:id)

Contact.where(:student_id => students).all

The above code is doing two queries, and on one of those queries we’re also looping all the way through the dataset.

-- first query, which we then loop over via .map(&:id)
select *
from students
where id in user_student_scope('user_id');

-- second query
select *
from contacts
where student_id in ('list', 'of', 'student', 'ids');

Because we know SQL we know that we can do better than that, though.

students = Student.scope_to_user(user_id).select(:id)

Contact.where(:student_id => students).all

This doesn’t look very different, but what we’ve done is write code that only makes one query and avoids looping over anything. The generated SQL is something like:

select *
from contacts
where student_id in (
	select id
	from students
	where id in user_student_scope('user_id')
);

Now obviously you’ve got to know the ORM a little bit for this particular example but the point stands: without knowing SQL you wouldn’t know why the first example is less performant code than the second.

I’m Not Alone

Who cares what I think anyway, right?

If that's still not enough, here's a little checklist that I think sums it up well:

Server performance checklist: 1) its usually SQL 2) you are using SQL inefficiently 3) too much SQL 4) your ORM is shit 5) its usually SQL
— Jeff Atwood (@codinghorror) February 1, 2014