I'm building a 1 Petabyte storage system, my budget is <$100K USD. /g/ could you give me some advice?
What would you do if you were tasked with this?
No dice, double the money.
One solution would be normal consumer gear, ie. mATX mobos with 4-6 SATA connectors, switches, 500 * 2TB HDs and pooling the storage with something like hadoop.
But even that with PSUs and all that extra cruft would cost you more then 100k.
And that would from the cheapest end, provided you can land decent discounts, which you should for that cash.
OP here, I'm going to use it for storing image streams of peoples' lives take from a first person perspective
Couldn't you just line up a bunch of storniators?
Each one of those holds 45 drives.
You would need 12 enclosures and 500x2TB drives.
12 enclosures alone would cost $84000 give or take.
Just picking a decent NAS drive, ST2000VN000, $100 per 2TB.
500x of those is gonna cost $50000
Total budget should be moved up to $134,000 for that.
If you want a real storage solution, you're probably going to need to increase your budget by 5-6 times. But I doubt you're looking for enterprise equipment (3PAR, HDS, NetApp FAS, etc).
Going to do long-term storage, plus a cache layer on the front that serves up recently / commonly accessed portions. Cache layer is unrelated, this is simply for cold, long-term storage.
No, it's for people to record their lives.
People conditioned to believe that everyone's private life should be non-existent and that other people actually care about their lives (or maybe they hope they can profit off of people spectating on their lifes, somehow).
Our users will pay a monthly fee to store the prior N months of their life. We've calculated the cost of storing a picture every second of every day plus audio to be around 35GB / month.
I think the best advice I can give you for that money, is that start small and scale up, since 100k wont give you 1PB of storage, unless you harness old VCRs, acquire fuckton of tapes and write some sort of data encoder, DIY tape drives yo..
So build a system that is easy to expand, relies on common gear to be found for a while and use software solutions that are active projects, like already mentioned Hadoop.
Don't rely on "costly" hardware solutions for data duplication etc. you don't want to rely on tech that you can't easily change or swap, one example would be HW RAID solutions.
Don't get vendor locked so to speak.
This is the only way to have cheap and functional mass storage that can scale.
Look what other so called pioneers are doing, Google is using consumer tech to scale fast and not get hold up in costly investments if you need to scale down.
Here's what I've come up with, and I really should have used 4TB drives in my orignal suggestion but that was me just being lazy.
NewEgg has solid prices, but a reseller can likely negotiate for a special pricing for the volume you're looking for.
This is all budgetary regardless.
7x Storinator (45 Drive enclosure, redundant boot and power) - $6108.04 ea - $42756.28
1x NetShelter SX 42U 600mm Wide x 1070mm Deep Enclosure - $1200 ea - $1200
272x Seagate 4TB NAS Drive - $180 ea - $48960
In a RAID 6 you'll have roughly 1PB usable.
I didn't bother with power requirements, but you'll need a PDU to mount inside of that cabinet and power everything in there.
8K should cover that.
Feel free to ask any questions or expand on this.
Also, I believe only 2 drives would be allowed to fail, which seems kind of risky with 272 spinning drives.
You may want to consider several extra drives to act as backup/spares in the array.
To be honest I'd rather the guys manufacturing and buying this be upfront about being peeping toms and creeps. That at least I can understand. It's easier to deal with than the guy in cargo shorts and sandals who thinks he's part of the cyberpunk movement.
Yea, I was also considering something like RAID (6 + 0) so that we could have two failures per RAID 6 set.
Is there a reason you chose that Seagate as opposed to something like:
WD Green WD40EZRX 4TB @ $150
Seagate Desktop ST4000DM000 4TB @ $150
FB has >100 PB (facebook com/notes/facebook-engineering/under-the-hood-hadoop-distributed-filesystem-reliability-with-namenode-and-avata/10150888759153920)
I would assume Google has more. Back of the envelope on GMail alone is ~127 PB (cyber-knowledge net/blog/gmail/)
I only personally would avoid Green because they have a weird low power mode they sit in when they're not being used.
It really started to drive me nuts waiting for them to spin back up everytime I wanted to use them.
That was just my experience though, others may have had better.
I went with ST4000VN000 (should have thrown that in) because of the dedicated NAS design. That was only personal preference.
If you want to save a little, you can go with the ST4000DM000 (as you mentioned)that the 45 drives guys use in their system .http://blog.backblaze.com/2014/03/19/backblaze-storage-pod-4/
I don't think you would have an issue, but if I'm already spending that much I personally would bump up the drives to something of a bit high caliber.
No problem, it's a nice change of pace from the enterprise solutions I usually deal with.
You're just gonna need to get rackspace for everything, it's a shitload of drives to store.
Here's an article that you should read, the internet archive's brewster kahle does some back of the envelope calculations:
Haha yea, w.r.t. off the shelf solutions, if you haven't read that article above, you should too. I have a few friends who work at the archive and they did something like buy a bunch of off-the-shelf external hard drives from amazon b/c seagate sells them for less when they're packaged as "USB external storage drives".
Don't forget marketing , judging by your youtube view with sub 2000 view you look like a college kid who has been posting this to your friends and family.
You need to think bigger, you can't just rely on viral
That's assumed ;). Of course none of this even includes colocation or electricity bill, but the point of the thread was the fixed costs. Thanks for the answers on that end.
Worry not, the marketing plan includes more than just a youtube video.
wait no, fuck you.
I buy the hat, it records to micro sdhc
WHEN I WANT IT TO
I transfer the data to my pc and have full ownership of it
no wireless or gps
I also doubt your 18 hour battery life
That's the default mode that you get for free. People who don't care as much about their privacy will want to use the cloud service. I'm more like you and will probably only transfer files over USB.
Never said it would have 18 hours of battery life. Current tests are 7-10 hours depending on usage.
In the video the guy clearly states
>0:14 enough battery life to be on from the moment you wake up to the time you go to bed.
>1 million terabytes a day saved forever.