146

I've seen a number of people claim that you should specifically name each column you want in your select query.

Assuming I'm going to use all of the columns anyway, why would I not use SELECT *?

Even considering the question *SQL query - Select * from view or Select col1, col2, … colN from view*, I don't think this is an exact duplicate as I'm approaching the issue from a slightly different perspective.

One of our principles is to not optimize before it's time. With that in mind, it seems like using SELECT * should be the preferred method until it is proven to be a resource issue or the schema is pretty much set in stone. Which, as we know, won't occur until development is completely done.

That said, is there an overriding issue to not use SELECT *?

0

20 Answers 20

183

The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.

When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select * is an anti-pattern.


So selecting columns is not a premature optimization. A few things off the top of my head ....

  1. If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
  2. You can more easily scan code where that column is being used.
  3. You should always write queries to bring back the least amount of information.
  4. As others mention if you use ordinal column access you should never use select *
  5. If your SQL statement joins tables, select * gives you all columns from all tables in the join

The corollary is that using select * ...

  1. The columns used by the application is opaque
  2. DBA's and their query profilers are unable to help your application's poor performance
  3. The code is more brittle when changes occur
  4. Your database and network are suffering because they are bringing back too much data (I/O)
  5. Database engine optimizations are minimal as you're bringing back all data regardless (logical).

Writing correct SQL is just as easy as writing Select *. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.

10
  • 2
    In your first section, point #5 should read "select * gives you all columns from all tables in the join". In your second section, points #2 and #5 are not necessarily true, and should not be listed as reasons to not use "select *".
    – jimmyorr
    Commented Feb 18, 2009 at 14:50
  • 1
    @uglysmurf - thanks for the correction, but in regards to 2 & 5 - while they may not necessarily be true for all databases / dba's in all cases, I feel they are important and valid for the majority of cases and will leave them in. Using 'select *' never made a dba's job easier. Commented Feb 18, 2009 at 20:45
  • 11
    I'd Argue that #3(Brittle code) isn't really true. Depending on the implementation, Select * might make it LESS brittle, but I don't see how it could be more so.
    – JohnFx
    Commented Feb 18, 2009 at 22:45
  • 3
    @JohnFx, I guess you define brittle differently. Brittle is normally defined as 'breaks easily'. Having unknown or hard-to-find dependencies because each piece of code will use different columns means I can't easily change anything at the data level without full regression .. which seems brittle. Commented Feb 18, 2009 at 23:10
  • 9
    @mavnn, w.r.t. brittleness, I fear this is a devolving into a semantics issue on my choice of the word brittle. My last word is to say it makes little difference anyways. The only scenario is renamed / removed columns. You are just moving the break from when the sql is executed (explicit) versus breaking when the results are consumed. The way in which the query result is consumed can vary, and the code may or may not silently fail, but the sql execution engine will definitely fail with invalid sql. So did select * help you? IMO explicit failure closer to the DB for a DB issue is better. Thx Commented May 12, 2009 at 21:56
44

If your code depends on the columns being in a specific order, your code will break when there are changes to the table. Also, you may be fetching too much from the table when you select *, especially if there is a binary field in the table.

Just because you are using all the columns now, it doesn't mean someone else isn't going to add an extra column to the table.

It also adds overhead to the plan execution caching since it has to fetch the meta data about the table to know what columns are in *.

4
  • 4
    Good answer, but I'd change the "code will break" to "code MAY break." That's the real trouble here, the "select *" use doesn't ALWAYS produce a breaking change. And when the break does happen is usually highly decoupled from the use that ends up broken.
    – BQ.
    Commented Nov 26, 2008 at 16:32
  • 4
    If someone is referencing columns ordinally in their code, they're in trouble regardless of whether they use SELECT * or not. The plan execution overhead is trivial, and wouldn't matter anyway once the plan is cached. Commented Nov 26, 2008 at 16:41
  • 2
    Then the programmer error lies in writing code that depends on the sequence of the columns. You never need to do that.
    – dkretz
    Commented Nov 26, 2008 at 19:54
  • 1
    @doofledorfer - never say never. It's faster to access ordinal columns, and it is practical at times. It's a bigger error to use select * than there is to use ordinal access. Commented Nov 26, 2008 at 21:28
24

One major reason is that if you ever add/remove columns from your table, any query/procedure that is making a SELECT * call will now be getting more or less columns of data than expected.

2
  • 3
    You should never write code that depends on the number of columns returned anyway.
    – dkretz
    Commented Nov 26, 2008 at 19:55
  • 4
    But everyone is writing code that requires that the programmers know which data is coming back. You can't Ctrl+F your column name if it's hidden in a SELECT *. Commented May 6, 2011 at 21:07
18
  1. In a roundabout way you are breaking the modularity rule about using strict typing wherever possible. Explicit is almost universally better.

  2. Even if you now need every column in the table, more could be added later which will be pulled down every time you run the query and could hurt performance. It hurts performance because

    • You are pulling more data over the wire; and
    • Because you might defeat the optimizer's ability to pull the data right out of the index (for queries on columns that are all part of an index.) rather than doing a lookup in the table itself

When TO use select *

When you explicitly NEED every column in the table, as opposed to needing every column in the table THAT EXISTED AT THE TIME YOU WROTE THE QUERY. For example, if were writing an DB management app that needed to display the entire contents of the table (whatever they happened to be) you might use that approach.

3
  • 1
    Another time to use SELECT * would be when you're doing test queries using the db client.
    – cdmckay
    Commented Feb 18, 2009 at 20:51
  • That seems like a strange exception given the context of the question. Other than saving some typing, what is the advantage of doing this for test queries?
    – JohnFx
    Commented Feb 18, 2009 at 22:43
  • 1
    Also SELECT * FROM (SELECT a, b, c FROM table) is OK.
    – kmkaplan
    Commented Jun 19, 2018 at 10:37
13

There are a few reasons:

  1. If the number of columns in a database changes and your application expects there to be a certain number...
  2. If the order of columns in a database changes and your application expects them to be in a certain order...
  3. Memory overhead. 8 unnecessary INTEGER columns would add 32 bytes of wasted memory. That doesn't sound like a lot, but this is for each query and INTEGER is one of the small column types... the extra columns are more likely to be VARCHAR or TEXT columns, which adds up quicker.
  4. Network overhead. Related to memory overhead: if I issue 30,000 queries and have 8 unnecessary INTEGER columns, I've wasted 960kB of bandwidth. VARCHAR and TEXT columns are likely to be considerably larger.

Note: I chose INTEGER in the above example because they have a fixed size of 4 bytes.

1
  • 1
    1 and 2 would be a code smell and 3 and 4 sound like premature optimization
    – NikkyD
    Commented Dec 19, 2016 at 16:47
7

If your application gets data with SELECT * and the table structure in the database is changed (say a column is removed), your application will fail in every place that you reference the missing field. If you instead include all the columns in your query, you application will break in the (hopefully) one place where you initially get the data, making the fix easier.

That being said, there are a number of situations in which SELECT * is desirable. One is a situation that I encounter all the time, where I need to replicate an entire table into another database (like SQL Server to DB2, for example). Another is an application written to display tables generically (i.e. without any knowledge of any particular table).

4
  • The question isn't 'is select * ever desirable', so the 2nd part of your answer is irrelevant. The question states that using 'select *' should be preferable, which of course is complete bollocks. Commented Nov 26, 2008 at 21:31
  • Yes, my 2nd part is irrelevant. OQ changed the question to state SELECT * is preferable, and yeah that's kind of bollocksy. Commented Nov 26, 2008 at 22:15
  • Ah yeah sorry - question changed it's direction after your answer. Commented Nov 27, 2008 at 19:37
  • That's alright. Even Mozart was an editor (stackoverflow.com/questions/292682/…). My original post suggested that use of SELECT * led to cannibalism. :) Commented Nov 27, 2008 at 20:30
3

I actually noticed a strange behaviour when I used select * in views in SQL Server 2005.

Run the following query and you will see what I mean.

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [A] [varchar](50) NULL,
    [B] [varchar](50) NULL,
    [C] [varchar](50) NULL
) ON [PRIMARY]

GO

insert into dbo.starTest
select 'a1','b1','c1'
union all select 'a2','b2','c2'
union all select 'a3','b3','c3'

go
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vStartest]'))
DROP VIEW [dbo].[vStartest]
go
create view dbo.vStartest as
select * from dbo.starTest
go

go
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vExplicittest]'))
DROP VIEW [dbo].[vExplicittest]
go
create view dbo.[vExplicittest] as
select a,b,c from dbo.starTest
go


select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicitTest

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [A] [varchar](50) NULL,
    [B] [varchar](50) NULL,
    [D] [varchar](50) NULL,
    [C] [varchar](50) NULL
) ON [PRIMARY]

GO

insert into dbo.starTest
select 'a1','b1','d1','c1'
union all select 'a2','b2','d2','c2'
union all select 'a3','b3','d3','c3'

select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicittest

Compare the results of last 2 select statements. I believe what you will see is a result of Select * referencing columns by index instead of name.

If you rebuild the view it will work fine again.

EDIT

I have added a separate question, *“select * from table” vs “select colA, colB, etc. from table” interesting behaviour in SQL Server 2005* to look into that behaviour in more details.

2

You might join two tables and use column A from the second table. If you later add column A to the first table (with same name but possibly different meaning) you'll most likely get the values from the first table and not the second one as earlier. That won't happen if you explicitly specify the columns you want to select.

Of course specifying the columns also sometimes causes bugs if you forget to add the new columns to every select clause. If the new column is not needed every time the query is executed, it may take some time before the bug gets noticed.

2

I understand where you're going regarding premature optimization, but that really only goes to a point. The intent is to avoid unnecessary optimization in the beginning. Are your tables unindexed? Would you use nvarchar(4000) to store a zip code?

As others have pointed out, there are other positives to specifying each column you intend to use in the query (such as maintainability).

2

When you're specifying columns, you're also tying yourself into a specific set of columns and making yourself less flexible, making Feuerstein roll over in, well, whereever he is. Just a thought.

2
  • 1
    I have absolutely no idea who Feuerstein is. Tried googling and found a psychologist, a television character and a blogger so the best I could come up with was a joke.
    – NotMe
    Commented Feb 14, 2012 at 22:52
  • Author of the O'Reilly books on PL/SQL. Try googling "feuerstein sql" instead of just "feuerstein".
    – orbfish
    Commented Feb 21, 2012 at 18:58
2

SELECT * is not always evil. In my opinion, at least. I use it quite often for dynamic queries returning a whole table, plus some computed fields.

For instance, I want to compute geographical geometries from a "normal" table, that is a table without any geometry field, but with fields containing coordinates. I use postgresql, and its spatial extension postgis. But the principle applies for many other cases.

An example:

  • a table of places, with coordinates stored in fields labeled x, y, z:

    CREATE TABLE places (place_id integer, x numeric(10, 3), y numeric(10, 3), z numeric(10, 3), description varchar);

  • let's feed it with a few example values:

    INSERT INTO places (place_id, x, y, z, description) VALUES
    (1, 2.295, 48.863, 64, 'Paris, Place de l\'Étoile'),
    (2, 2.945, 48.858, 40, 'Paris, Tour Eiffel'),
    (3, 0.373, 43.958, 90, 'Condom, Cathédrale St-Pierre');

  • I want to be able to map the contents of this table, using some GIS client. The normal way is to add a geometry field to the table, and build the geometry, based on the coordinates. But I would prefer to get a dynamic query: this way, when I change coordinates (corrections, more accuracy, etc.), the objects mapped actually move, dynamically. So here is the query with the SELECT *:

    CREATE OR REPLACE VIEW places_points AS
    SELECT *,
    GeomFromewkt('SRID=4326; POINT ('|| x || ' ' || y || ' ' || z || ')')
    FROM places;

    Refer to postgis, for GeomFromewkt() function use.

  • Here is the result:

    SELECT * FROM places_points;

 place_id |   x   |   y    |   z    |         description          |                            geomfromewkt                            
----------+-------+--------+--------+------------------------------+--------------------------------------------------------------------  
        1 | 2.295 | 48.863 | 64.000 | Paris, Place de l'Étoile     | 01010000A0E61000005C8FC2F5285C02405839B4C8766E48400000000000005040  
        2 | 2.945 | 48.858 | 40.000 | Paris, Tour Eiffel           | 01010000A0E61000008FC2F5285C8F0740E7FBA9F1D26D48400000000000004440
        3 | 0.373 | 43.958 | 90.000 | Condom, Cathédrale St-Pierre | 01010000A0E6100000AC1C5A643BDFD73FB4C876BE9FFA45400000000000805640
(3 lignes)

The rightmost column can now be used by any GIS program to properly map the points.

  • If, in the future, some fields get added to the table: no worries, I just have to run again the same VIEW definition.

I wish the definition of the VIEW could be kept "as is", with the *, but hélas it is not the case: this is how it is internally stored by postgresql:

SELECT places.place_id, places.x, places.y, places.z, places.description, geomfromewkt(((((('SRID=4326; POINT ('::text || places.x) || ' '::text) || places.y) || ' '::text) || places.z) || ')'::text) AS geomfromewkt FROM places;

1

Even if you use every column but address the row array by numeric index you will have problems if you add another row later on.

So basically it is a question of maintainability! If you don't use the * selector you will not have to worry about your queries.

1

Selecting only the columns you need keeps the dataset in memory smaller and therefor keeps your application faster.

Also, a lot of tools (e.g. stored procedures) cache query execution plans too. If you later add or remove a column (particularly easy if you're selecting off a view), the tool will often error when it doesn't get back results that it expects.

1

It makes your code more ambiguous and more difficult to maintain; because you're adding extra unused data to the domain, and it's not clear which you've intended and which not. (It also suggests that you might not know, or care.)

1

To answer you question directly: Do not use "SELECT *" when it makes your code more fragle to changes to the underlying tables. Your code should break only when a change is made to the table that directly affects requirments of your program.

Your application should take advantage of the abstraction layer that Relational access provides.

1

I don't use SELECT * simply because it is nice to see and know what fields I am retrieving.

1

Generally bad to use 'select *' inside of views because you will be forced to recompile the view in the event of a table column change. Changing the underlying table columns of a view you will get an error for non-existant columns until you go back and recompile.

1

It's ok when you're doing exists(select * ...) since it never gets expanded. Otherwise it's really only useful when exploring tables with temporary select statments or if you had a CTE defined above and you want every column without typing them all out again.

0
1

Just to add one thing that no one else has mentioned. Select * returns all the columns, someone may add a column later that you don't necessarily want the users to be able to see such as who last updated the data or a timestamp or notes that only managers should see not all users, etc.

Further, when adding a column, the impact on existing code should be reviewed and considered to see if changes are needed based on what information is stored in the column. By using select *, that review will often be skipped because the developer will assume that nothing will break. And in fact nothing may explicitly appear to break but queries may now start returning the wrong thing. Just because nothing explicitly breaks, doesn't mean that there should not have been changes to the queries.

0

because "select * " will waste memory when you don't need all the fields.But for sql server, their performence are the same.

Not the answer you're looking for? Browse other questions tagged or ask your own question.