Click Technology

AWS Cost reckoning tips

February3

Welp, it’s FAANG redundancy time and so companies are looking to save a few quid. Here’s a couple of cheap and easy fixes that should save enough to pay for half your devops team.

Correct sizing of EC2 instances

This is so often overlooked by DevOps when it comes to saving money. Over/under spec’d devices cost extra cash as well as leaving them switched on when they’re not being used is bad news for the company’s wallet.

To size instances correctly, check out my favourite site https://ec2instances.info for a comprehensive search/filter site for all EC2 instances. They’re on GitHub too so give them a like if you get time. Then, head over to AWS Compute Optimizer which learns what your instances do and then tells you (after about a month) whether they’re too big or too small. Resize accordingly.

Your choices may be limited by things like needing to use SSDs which are still ephemeral, but you should be able to use the CO to make rational decisions about resource allocation.

Once you’re happy the performance is right and the size is correct, it’s time to start locking in the value for the company.

Get Reserved Instances

Next, ffs buy Reserved Instances. Don’t pay the On Demand sticker price for EC2 instances; that’s for amateurs. Reserved Instances are precisely that. You rent servers and pay up front partially or completely for 1 to 3 years to reduce cost.

Always buy RIs if you have a fairly reasonable idea of future demand. You don’t have to be a clairvoyant either; you can always start by buying in one year contracts and that’s pretty standard for business. I used to live in Germany and there, everything is done in one year chunks, so estimate your usage and commit to buying Reserved Instances to meet your needs, thus saving real costs and real money. Again, you are just bleeding out costs unnecessarily without RIs. If you can commit to 3 years, and that’s not that improbable, you can save a whopping 70% on your costs.

Let’s table up some real life costs then for some samples on an annual basis..

Type	vCPU	RAM	On Demand	Spot	1 Yr Reserved	3 Yr Reserved	-%
t3.micro	2	1	$91.10	$28.91	$56.94	$39.42	56.7
r5a.2xlarge	8	64	$3,959.52	$1,841.35	$2,496.60	$1,708.20	56.9
r5.16xlarge	64	512	$35,320	$13,714	$22,250	$15,259	56.8

Develop your software for spot instances

Now you can see the difference in costs, it’s time to start breaking out the technologies that allow you to use swarm technology, like Docker, Kubernetes and so on so you can run software on goal oriented clusters that can take advantage of spot instances. Spot instances are basically spare instances that are not being used. Like a last-minute holiday, or buying flowers before the market closes, means you get a significant reduction in price and AWS get some $$$ for their otherwise unused instance.

Of course, not everything can be swarmed and bioinformatics is no different, so it applies more to the nodejs / Angular / Svelte / React space and modern containerized apps, but if you can, the spot pricing might be the option for you. It is by far the cheapest option and lowest commitment. On the downside, depending on what you’re using, if esoteric, may be more difficult to spin up under times of heavy consumer demand.

Convert all your Buckets to S3 Intelligent Tiering.

You use S3 Buckets? Great. It’s perfect for petabytes of data and is bigger than all the storage you’ll ever own, that’s for sure. However, you absolutely MUST switch the storage mode to Intelligent Tiering to really save a fortune.

Typically, people tend not to configure the S3 storage so it is left in Standard mode and costs about $0.022 per GB. That’s not that much of course until you put terabytes of data onto S3 and pay full price for it.

With about say 700TB on S3 in standard mode, you’ll be spending in the region of $15,000USD on storage a month. That’s 100% unsustainable and if you’re also paying for support at about $4k and you’re not being warned about it each and every day, you’re an absolute mug.

So, using the pricing calculator ( https://calculator.aws ), let’s do the maths for this model..

S3 Standard storage

700 TB per month x 1024 GB in a TB = 716800 GB per month

Capacity	Note	Price in USD	Total
51,200 GB	first 50TB	$0.024	$1,228.80
460,800 GB	subsequent 450TB	$0.023	$10,598.40
204,800 GB	then.. 200TB	$0.022	$4,505.60
			$16,332.80 / month
		UK VAT @ 20%	$3,266.56
		Total	$19,599.36 or £16,254.33 / month

Pricing per month for S3 storage in standard mode.

So what realistically can be achieved?

Well, based on my own experience, 60% reduction in cost is perfectly possible and probably up to 70% so that would be a drop from $12,716.39 a month to $5,093.04 for around 700TB. Nearly 60% savings or $111k USD in a year!

Some notes..

You will, as always, need to know your environment well and use the pricing calculator carefully to understand the costs. You will also need to have your average object size info and other details to get the analysis as close to real-world conditions as possible. One useful question I was often asked was “Will this mean if we have to recover data, we get charged for it if it’s in the Infrequent Access Tier?” No. That would only apply if you had moved your data out of S3 Intelligent Tiering into one of the S3 Glacier Tiers, namely S3 Glacier and S3 Glacier Deep Archive. Then, charges apply for recovery of data from the archive.

The HOWTO is here and you can get all your buckets converted in one go using this AWS Article and using the Python script therein, which is here in the AWS repo. Naturally, test first as I did and see how it looks. Worked fine and scaled fine. For my case I used Ansible just as easily.

You may well be interested to know if there is a performance hit? No. None. Nobody will notice anything. Run script, everything keeps working as normal.

The only disadvantage is that it will take time to transition everything. If you run the code today, day 0, you’ll wait 30 days for unused data to transition from Frequent Access Tier to Infrequent Access Tier. Then you have to wait until day 90 for the data to move from Infrequent Access to Archive Instance Access, so there will be a delay. That said, you can expect cost to reduce step-wise by about 25% in the 1st month, then to 45% in the second to 60%+ after the 3rd month.

So, if you start say in August, you should feel the full effect of your action by Christmas, so the sooner you start, the sooner you save. It’s the lowest risk, least political, most effective move you can make at work and will realize really large savings to the business. That is perhaps the most important part of using cloud tech. You absolutely must work with the mindset that all costs are going on your own personal credit card and you will have to pay for it next month.

Articles about this often mention that there is a ‘small charge‘ for setting up Intelligent Tiering as AWS has to inventory all your files and prepare a list of all of them that are over 128k in size. How small is small? Are we talking 5/10k small? Couple of hundred bucks? What? The answer is about $25USD for 700GB with tens of millions of objects, so no need to lose sleep about these initial charges. They will be recouped almost immediately.

Clap! Clap! Clap! — CFO looking at me (off camera) cutting him a cheque for 110 grand.

posted under AWS, Linux Tips | Comments Off

Neat AWS feature

January19

So I’m studying for my Amazon Web Services Associate Architect ticket at the moment and, in the coursework I’m doing, I came across this really neat feature in AWS.

If you’re on an EC2 instance (a VM), and you do..

$ curl http://169.254.169.254/latest/meta-data

You get a whole load of metadata back, like this..

ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
events/
hostname
iam/
identity-credentials/
instance-action
instance-id
instance-life-cycle
instance-type
ipv6
local-hostname
local-ipv4
mac
managed-ssh-keys/
metrics/
network/
placement/
profile
public-hostname
public-ipv4
reservation-id
security-groups
services/

All you need to do now is pick any of these sub-topics and query it again…

curl http://169.254.169.254/latest/meta-data/local-hostname

and you get

ip-10-16-57-144.ec2.internal

or query

curl http://169.254.169.254/latest/meta-data/instance-type

ans you get

t2.micro

Nice huh? As the feature is a URL, then you can be sure you can query it from a Flask or NodeJS app directly so an app can now be aware of what kind of hardware it’s running on, so you could even have an app that would report in telemetry to a central server to let developers know about the characteristics of the host in relation to the app performance. Quite a neat piece of internal architecture.