Let's take some well-known CNN, say VGG16, and see in detail how exactly the memory is being spent. You can print the summary of it using Keras:
from keras.applications import VGG16
model = VGG16()
print(model.summary())
The network consists of 13 2D-convolutional layers (with 3×3 filters, stride 1 and pad 1) and 3 fully connected layers ("Dense"). Plus, there are an input layer, 5 max-pooling layers and a flatten layer, which do not hold parameters.
| 
 Layer  | 
 Output shape  | 
 Data memory  | 
 Parameters  | 
 Number of parameters  | 
| 
 InputLayer  | 
 224×224×3  | 
 150528  | 
 0  | 
 0  | 
| 
 Conv2D  | 
 224×224×64  | 
 3211264  | 
 3×3×3×64+64  | 
 1792  | 
| 
 Conv2D  | 
 224×224×64  | 
 3211264  | 
 3×3×64×64+64  | 
 36928  | 
| 
 MaxPool2D  | 
 112×112×64  | 
 802816  | 
 0  | 
 0  | 
| 
 Conv2D...  |