I've read many posts discussing why **SELECT *** is bad and should be avoided, but most of them don’t clearly explain why. Today, let's discuss why using **SELECT *** is not recommended, especially when working with large datasets that include millions of rows or columns.
1. Unnecessary I/O
You don't always need to select all columns. If you fetch all columns but only use a few of them, it wastes resources because you're reading unnecessary data from every page (we'll discuss pages in future posts). Also, **SELECT *** can interfere with index-only scans, which are more efficient.
2. Increased Network Traffic
If you are fetching unnecessary columns, it means you are consuming more network bandwidth. The more data you transfer, the greater the network load, which can slow down your application, especially when dealing with large datasets.
3. Increased Application Memory Usage
As your dataset grows, your application will require more memory to store the unnecessary data being fetched. This can put extra pressure on your application's memory, leading to performance issues.
4. Client-Side Deserialization
Before data is sent to the client, it is serialized according to protocol rules. The more data you send, the more serialization operations are needed, which consumes CPU resources. After sending, when the client receives the serialized data, it needs to deserialize it. This step also consumes CPU resources, adding even more overhead.
Conclusion
This post doesn’t claim that **SELECT *** is inherently bad, but rather, that not knowing when and where to use it is the real issue. It's fine to experiment with it while learning, but in production environments, using **SELECT *** indiscriminately can be costly in terms of performance.
Telegram: otabekswe