Introducing S2

326 points by brancz a day ago

animex 18 hours ago

IANAL,but naming your product S2 and mentioning in the intro that AWS S3 is the tech you are enhancing is probably looking for a branding/copyright claim from Amazon. Same vertical & definitely will cause consumer confusion. I'm sure you've done the research about whether a trademark has been registered.

https://tsdr.uspto.gov/#caseNumber=98324800&caseSearchType=U...

fcortes an hour ago

Fun fact: S2 and EC2 sound exactly the same in Spanish - both are "ese dos". Add that to EC2 and S3 already being confusing to tell apart by ear
volemo 17 hours ago

TBF, building something with the goal of enhancing S3 I would call it S4.
- snapplebobapple 15 hours ago
  
  Thats short term thinking. you need to leapfrog everybody and go s∞
  - koolba 4 hours ago
    
    That’s actually a pretty cool name if you pronounce the first letter the letter sound rather than as an initial: Sinfinity
    
    jeffrallen 2 hours ago
    
    Sounds more like a porn website...
    
    rolandog 38 minutes ago
    
    Very responsive log porn. ;-)
- jffry 16 hours ago
  
  Too late, name's taken for something else: https://incubator.apache.org/projects/s4.html
  - CobrastanJorji 16 hours ago
    
    And don't forget the other S4: http://www.supersimplestorageservice.com/
    It's like S3, except better because, by focusing on being a write-only data store, they can manage much more throughput and efficiency, plus your data is far more secure at rest than it is in S3.
  - sdesol 13 hours ago
    
    why not s11?
- soorya3 8 hours ago
  
  F3 - (Fast Furious Fail-Safe)
- skeeter2020 16 hours ago
  
  S3++ ? T4?
  My company is a Fivetran client, and they named that company after a (bad) joke, but it's worth a fortune.
  - hipadev23 11 hours ago
    
    Fivetran is going to zero because they don’t offer anything of actual value and their CEO isn’t a good person.
    [1] https://news.ycombinator.com/item?id=42434450
fasteo 6 hours ago

At least cloudflare’s R2 has an argument for the naming (IBM vs HAL, A Space Odyssey)
pxtail 2 hours ago

Yep, letter S and a number is copyrighted, can't do that
- Biganon an hour ago
  
  1) we're talking about trademark law, not copyright law.
  2) the problem here is that they're in the same business segment, and explicitly reference S3.
evertedsphere 15 hours ago

s3 (serverless stream store)
rsync 15 hours ago

What could possibly be better than being sued by Amazon for some nitpicky naming Issue ?
That’s the kind of David vs. Goliath publicity one could only dream of …
- blagie 14 hours ago
  
  98% of the time, law suits are just a money pit. There is zero publicity. A tiny number go viral. I don't think this is likely to be one of those times.
  Most people would simply say "Amazon is right." Because Amazon is right. This is an intentional attempt to leverage their product branding to promote a new product. There is very little good here.
  If this were open-source, academic, non-profit, or something like that, perhaps. A small venture trying to commercialize on some digital equivalent of Amazon's trade dress? I can't imagine anyone would care....
  Even those times when someone is 100% right, usually, there is zero publicity. Right or wrong, most times I've seen, the small guy would settle with the big guy with the deep legal pockets and move on because litigating is too expensive.
  In a situation like this one, your marketing spend / press coverage on the existing name is shot, links to your domain are shot, and perhaps you have an egg on your face, depending on how things play out.
kevingadd 17 hours ago

I'm not sure whether they consulted a bad trademark lawyer or didn't consult one at all, but it wouldn't have cost that much to do so. I say this having just recently started the process of filing a trademark - the cost is about the same as buying i.e. 's4.dev' according to the domain registry's website.
Having to rebrand your product after launching is a lot more painful than doing it before launching.

Lucasoato 8 minutes ago

Wow, imagine Debezium offering native compatibility with this, capturing the changes from a Postgres database, saving them as delta or iceberg in a pure serverless way!

myflash13 a day ago

This is a really good idea, beautiful API, and something that I would like to use for my projects. However I have zero confidence that this startup would last very long in its current form. If it's successful, AWS will build a better and cheaper in-house version. It's just as likely to fail to get traction.

If this had been released instead as a Papertrail-like end-user product with dashboards, etc. instead of a "cloud primitive" API so closely tied to AWS, it would make a lot more sense. Add the ability to bring my own S3-Compatible backend (such as Digital Ocean Spaces), and boom, you have a fantastic, durable, cloud-agnostic product.

shikhar a day ago

(Founder) we do intend to be multi-cloud, we are just starting with AWS. Our internal architecture is not tied to AWS, it's interfaces that we can implement for other cloud systems.
torginus 6 hours ago

It would be extra ironic if the whole thing already ran on top of AWS.
There's no end to startups which can be described as existing-open-source-software as a service, marketed as a cheaper alternative to AWS offerings.. who run on AWS.
Too 10 hours ago

They just did https://news.ycombinator.com/item?id=42211280 (Amazon S3 now supports the ability to append data to an object, 30 days ago). Azure has had the same with append blobs for a long time. It's still a bit more raw than S2, without the concept of record. The step for a cloud provider to offer this natively is very small. And with the concept of a record, isn't this essentially a message queue, where the competitor space is equally big? Likewise if you look into log storage solutions.
- shikhar 5 hours ago
  
  (Founder) Both S3 Express _One Zone_ appends and Azure's append blobs charge the regular PUT price for appends. It may work for you, but probably not if you want to do smaller writes.
  Blob stores will also not let you do tailing reads, like you can with S2.
  In AWS, S2's Express storage class takes care of writing to a quorum of 3 zonal buckets for regional durability.
  I doubt object stores will go from operating at the level of blobs and byte ranges, to records and sequence numbers. But I could be wrong.
qudat 19 hours ago

People keep making the same argument against Aptible (https://aptible.com) and it is still a very successful PaaS over a decade later.
- joshstrange 4 hours ago
  
  I had never heard of this company so I took a look and the main pitch was compelling and then I went to the pricing page and saw the pricing goes from $0 to $500 a month once you want to go to “production”. i’m clearly not the target market, which makes sense why I’ve never heard it.
gr__or 6 hours ago

If you do cloud infra stuff, AWS will try to undercut you on price but will never outdo you on D/UX. So I wouldn't let Beezus hold me back
throwaway519 15 hours ago

Amazon don't compete for price sensitive product offerings.
If anything, they normlise an expectation with a budget aware base.

solatic a day ago

Help me understand - you build on top of AWS, which charges $0.09/GB for egress to the Internet, yet you're charging $0.05/GB for egress to the Internet? Sounds like you're subsidizing egress from AWS? Or do you have access to non-public egress pricing?

CodesInChaos an hour ago

Looks like they changed it to $0.08/GB. Which loses them at most $300/month at 50TB, and makes money after that.
shikhar a day ago

(Founder) We are not charging in preview. At the scale where it matters, we will work it out. Definitely some assumptions in here.
- bigstrat2003 16 hours ago
  
  For what it's worth, there's zero chance I would do business with a company whose business plan is "we'll work it out". It gives one every reason to believe that in a couple years time you guys will either be out of business (because you didn't figure out the numbers to make a profit) or will pull the rug from under customers in the form of surprise price hikes. Obviously you have to do what you think is right, but I think that this approach is going to scare off a lot of customers for you.
  - shikhar 15 hours ago
    
    (Founder) We are not charging during preview. If anything, I wanted to be transparent about our planned pricing. Our mission is to make streams a cloud storage primitive, and I worked backwards from there in terms of our costs and expected costs looking ahead once we can scale a bit - based on concrete data points about what kind of discounts can be unlocked. I realized it was premature based on the comments here, so the price for internet egress has been updated. Thank you for your feedback.
    
    poopiokaka 13 hours ago
    
    [flagged]
- 8n4vidtmkvmk a day ago
  
  Just FYI, that doesn't give me confidence in the longevity of your service.
  - srik 21 hours ago
    
    Cloud services offer giant discounts sometimes and the receiving party aren't allowed to talk about it concretely so that's probably what's happening here.
  - shikhar 21 hours ago
    
    (Founder) I understand the concern. However, cloud discounts at scale can be very large, and we are going to share as much of it as we reasonably can.
    
    everfrustrated 19 hours ago
    
    Discounts require multi year commitment for minimum (and increasing) spend. Generally you need to be either profitable or a well funded startup to demonstrate why a vendor would trust your ability to pay (it's literally a debt on your books). How do they know you're good for it?
    Plus multi cloud means less scale and less marketing incentive (can't talk about you as a x cloud customer).
    I wish you the best, but would encourage you to not set your prices below your costs.
    
    shikhar 18 hours ago
    
    (Founder) Thank you for the advice. I hope we can offer better when the deals come into play, but for now setting our planned internet egress price to $0.08/GiB.
- JoshTriplett 17 hours ago
  
  Do you plan to charge differently for bandwidth depending on whether the customer is in AWS or not? Would be nice if you pass on the cost savings.
  - shikhar 17 hours ago
    
    (Founder) Yes, we will charge less for private connectivity. Pricing is transparent https://s2.dev/pricing - free during preview.
    
    CodesInChaos 35 minutes ago
    
    Doesn't AWS charge $0.01 intra region and $0.02 between regions, even without setting up private links? Can't you pass part of those savings (compared to the $0.05-$0.09 of egress) on? Or is it too difficult to detect if the remote IP qualifies?
    
    shikhar 31 minutes ago
    
    (Founder) Unfortunately, if you access over a public IP, it is internet egress. Even if the client is in the same AWS cloud region. PrivateLink is the only option.
nfm 21 hours ago

List pricing is $0.05 per GB after 150TB and at high volume it’s cheaper than that
amazingman 12 hours ago

Nobody with sufficient scale will be paying retail for data transfer.
kondro 20 hours ago

They’re probably betting on most users being in AWS and only having to pay 1¢-2¢ transfer.
- ckdarby 8 hours ago
  
  They're also banking on scale to PPA with a specific amendment for egress.
MicolashKyoka 21 hours ago

strat is likely just get users, then offboard aws if the product works.
- shikhar 18 hours ago
  
  (Founder) No, we want to be in the same cloud regions as customers.

throwawayian 4 hours ago

I look at the egress costs to internet and it doesn’t check out. It’s a premium product dependent on DX, marketed to funded startups.

But if I care about ingress and egress costs, which many stream heavy infrastructure providers do.. This doesn’t add up.

I wish them luck, but I feel they would have had a much better chance from the start by getting some funding and having a loss leader start, then organising and passing on wholesale rates from cloud providers once they’d reached critical mass.

Instead they’re going in at retail which is very spicy. I feel like someone will clone the tech and let you self host, before big players copy it natively.

It’s a commodity space and they’re starting with a moat of a very busy 2 weeks from some Staff engineers at AWS.

shikhar 4 hours ago

(Founder) Thanks for sharing your thoughts. We are early and figuring things out. I agree egress cost is going to be a big concern. We want to do the best we can for users as we unlock some scale. During preview, we are focused on getting feedback so the service is free (we will need to talk if the usage is significant though).

masterj a day ago

So is this basically WarpStream except providing a lower-level API instead of jumping straight to Kafka compatibility?

An S3-level primitive API for streaming seems really valuable in the long-term if adopted

shikhar a day ago

(Founder) That somewhat summarizes yes :) We take a different approach than WarpStream architecturally too, which allows us to offer much lower latencies. No disks in our system, either.

iambateman a day ago

These folks knowingly chose to spend the rest of their careers explaining that they are not, in fact, S3.

shikhar a day ago

(Founder) well 50% of our name is different
- joenot443 21 hours ago
  
  I like it. I see it as ostensibly a product for engineers and so when I see a name like S2 it's immediately clear that it's a product led and conceived by engineers.
  I also see that on your pricing page -
  "We are building the S3 experience for streaming data, and that includes pricing transparency"
  Love the simple and earnest copy. One can imagine what an LLM would cook up instead, I find the brevity way preferable.
  - shikhar 19 hours ago
    
    (Founder) Thank you for the kind comment!
    Yes we are not trying to confuse S2 with S3, we just think S3 is the best damn serverless experience out there, and we aspire to that greatness. We borrowed the structure of that name to reflect that aspiration, as have other services inspired by S3 like Cloudflare's object store R2.
    
    andreasmetsala 7 hours ago
    
    I actually thought S2 is a Cloudflare service at first.
- do_not_redeem a day ago
  
  You should have gone with S4 tbh. The suits love bigger numbers. Super Simple Stream Store.
  - phamilton a day ago
    
    http://www.supersimplestorageservice.com/ exists and calls itself S4. It's a decent gag and the immediately came to mind when I heard S2 vs S3.
  - nsxwolf 21 hours ago
    
    How do you store a stream? Don’t they just spray around the internet here and there, and if you don’t catch them in the moment, they’re just gone?
    
    shikhar 16 hours ago
    
    (Founder) I thought you were joking but coming back it could well be serious :)
    When we say stream, we really mean The Log that Jay Kreps has a famous blog about https://engineering.linkedin.com/distributed-systems/log-wha...
    We say stream because we would rather not be confused with "logs" as in application logs, but rather associate with the world of streaming data where this primitive is very relevant. We don't mean stream as in a TCP stream or live stream.
    You can, however stream Star Wars on S2 ;-) https://s2.dev/docs/quickstart#get-started-with-the-cli
  - shikhar a day ago
    
    (Founder) I have definitely received that advice before :) - to not seem like a regression from S3. But as an abbreviation for Stream Store, it made sense.
    
    blipvert a day ago
    
    Why not just use SS? There can’t possibly be any negative connotations there.
    
    edoceo 21 hours ago
    
    Reserved by GM for the Super Sport
    
    ethbr1 20 hours ago
    
    So that's why GM has been asking itself "Are we the baddies?" lately.
    
    debuggerpk 20 hours ago
    
    SS .. as in nazi?
    
    HideousKojima 21 hours ago
    
    You could even make the s look kind of like a lightning bolt to emphasize how fast it is
    
    kdmtctl 20 hours ago
    
    Quite dangerous. Will look almost like the Schutzstaffel runic insignia. I'd better avoid this resemblance.
    
    MrDOS 17 hours ago
    
    thatsthejoke.jpg
    
    rswail 8 hours ago
    
    S3++?
  - rswail 8 hours ago
    
    Surely S3++? /s
  - smitty1e a day ago
    
    Disagree. You have a marketing opportunity for a hipster character named "Stu" to be the spokesman.
    
    jaredwiener 21 hours ago
    
    Disco Stu don't advertise
- iambateman a day ago
  
  Props to you for having a sense of humor about it. :D
  If I could put in one request...a video which describes what it is and how to use it would make it easier for me to understand.
  - shikhar 14 hours ago
    
    (Founder) Yes we should create a video, thanks for the feedback.
    In the meantime, checkout this quickstart which will have to streaming Star Wars with the S2 CLI and give you a pretty good sense of things https://s2.dev/docs/quickstart#get-started-with-the-cli
    (You will have to apply to join the preview, but we are approving quickly)
- CobrastanJorji 11 hours ago
  
  You could say that. Or, in binary ASCII, you could say your name is 93.75% the same (it flips only the last bit of 16).
- davej 20 hours ago
  
  Your 66.66% (2/3) of the way there to the second character too. So I would say your only 16.66% different across the two characters.
- ozim 20 hours ago
  
  I would look much more into Levenshtein Distance ;) if I would like to be smart ass funny.
- binary132 a day ago
  
  You're 50% of the way closer to 1st!
rcpt 21 hours ago

Or this
https://github.com/google/s2geometry
jsheard a day ago

How many of these letter-number storage services are there now? S3, B2, R2, S2...
- thaumasiotes a day ago
  
  S3 isn't the name of the service - that's "Amazon Simple Storage Service". S3 is a nickname, short for "Simple Storage Service".
  - eterm a day ago
    
    Nickname implies it's unofficial, but S3 is very much the product name too:
    https://aws.amazon.com/s3/faqs/
    "Simple storage service" is used once. "S3" is used throughout.
  - hnlmorg a day ago
    
    While you’re technically correct, for all intents and purposes it is called S3 even by AWS themselves.
  - rswail 8 hours ago
    
    and EC2 stands for "Elastic Compute Cloud". but no one remembers that.
  - anal_reactor a day ago
    
    When I was a student we had a Facebook group to share information, and one angry guy ranted that the correct shortening of "Mathematical Analysis" is not, in fact, "anal", as we were used to say
andrelaszlo 20 hours ago

Seems preferable to having to explain you're not a paramilitary organization responsible for unspeakable war crimes. Nothing funny about that.
OJFord a day ago

Including potentially in court / to lawyers? IANAL, but isn't this just inviting Amazon to claim it's deliberately leveraging their 'S3' trademark and sowing confusion in order to lift their own brand? (Correctly, and even somewhat transparently in TFA, IMO.)
cchance a day ago

My issue is that 2<3 and for most people they will just assume its older/shittier S3 lol

pram a day ago

It looks neat but, no Java SDK? Every company I've personally worked at is deeply reliant on Spring or the vanilla clients to produce/consume to Kafka 90% of the time. This kind of precludes even a casual PoC.

infiniteregrets 21 hours ago

(S2 Team member) As we move forward, a Java/Kotlin and a Python SDK are on our list. There is a Rust sdk and a CLI available (https://s2.dev/docs/quickstart) . Rust felt as a good starting point for us as our core service is also written in it.

karmakaze 20 hours ago

I do like this. The next part I'd like someone to build on top of this is applying the stream 'events' into a point-in-time queryable representation. Basically the other part to make it a Datatomic. Probably better if it's a pattern or framework for making specific in-memory queryable data rather than a particular database. There's lots of ways this could work, like applying to a local Sqlite, or basing on a MySQL binlog that can be applied to a local query instance and rewindable to specific points, or more application-specific apply/undo events to a local state.

Scaevolus a day ago

This is a very useful service model, but I'm confused about the value proposition given how every write is persisted to S3 before being acknowledged.

I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?

AWS has shown their willingness to implement mostly-protocol compatible services (RDS -> Aurora), and I could see them doing the same with a Kafka reimplementation.

sensodine 21 hours ago

(S2 team member here)
> I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?
This is how it works essentially, yes. Architecting the system so that chunks that are written to object storage (before we acknowledge a write) are multi-tenant, and contain records from different streams, lets us write frequently while still targeting ideal (w/r/t price and performance) blob sizes for S3 standard and express puts respectively.
- philjohn 19 hours ago
  
  Wait, data from multiple tenants is stored in the same place. Do you have per-tenant encryption key, or how else are you ensuring no bugs allow tenants to read others data?
  - shikhar 19 hours ago
    
    (Founder) We will be using authenticated encryption with per-basin (our term for bucket) or per-stream keys, but we don't have this yet. This is noted on https://s2.dev/docs/security#encryption

evantbyrne 19 hours ago

Seems like really cool tech. Such a bummer that the it is not source available. I might be a minority in this opinion, but I would absolutely consider commercial services where the core tech is all released under something like a FSL with fully supported self-hosting. Otherwise, the lock-in vs something like kafka is hard to justify.

shikhar 19 hours ago

(Founder) We are happy for S2 API to have alternate implementations, we are considering an in-memory emulator to open source ourselves. It is not a very complicated API. If you would prefer to stick with the Kafka API but benefit from features like S2's storage classes or having a very large number of topics/partitions or high throughput per partition, we are planning an open source Kafka compatibility layer that can be self-hosted, with features like client-side encryption so you can have even more peace of mind.
- rswail 8 hours ago
  
  Having a kafka compatible API and S3 storage would be something I would jump to, the savings over MSK would be huge.
  If you had a (paid for) API that sat on top of an S3 API for on-prem, that would be fantastic as well.
  Kafka is great, but the whole Java ecosystem and the lack of control of what is in the topics and the stuff about co-ordinating the cluster in zookeeper is a management PITA.
  - emgeee 2 hours ago
    
    Checkout warpstream (recently acquired by confluent)
- evantbyrne 18 hours ago
  
  First-class kafka compatibility could go a long way to making it a justifiable tech choice. When orgs go heavy on event streaming, that code gets _everywhere_, so a vendor off-ramp is needed.
  - shikhar 18 hours ago
    
    (Founder) That makes sense. We would eventually host the Kafka layer too - and will be able to avoid a hop by inlining our edge service logic in there.

cultofmetatron 7 hours ago

I had an idea like this a few years ago. basicly emitting a stream interface to a cloud based fs to enable random access seeking on bystreams. I envisioned it to be useful for things like loading large files. would be amazing for enabling things like cloud gaming, images processing and CAD

kudos for sitting down and makin it happen!

h05sz487b a day ago

Just you wait, I am launching S1 next year!

graypegg 20 hours ago

Ok good, my startup S½ (also known as Ç) is still unique, phew
- andrethegiant 19 hours ago
  
  Dibs on S0

bushido a day ago

I wish more dev-tools startups would focus on clearly explaining the business use cases, targeting a slightly broader audience beyond highly technical users. I visited several pages on the site before eventually giving up.

I can sort of grasp what the S2 team is aiming to achieve, but it feels like I’m forced to perform unnecessary mental gymnastics to connect their platform with the specific problems it can solve for a business or product team.

I consider myself fairly technical and familiar with many of the underlying concepts, but I still couldn’t work out the practical utility without significant effort.

It’s worth noting that much of technology adoption is driven by technical product managers and similar stakeholders. However, I feel this critical audience is often overlooked in the messaging and positioning of developer tools like this.

shikhar a day ago

(Founder) Appreciate the feedback. We will try to do a better job on the messaging. It is geared at being a building block for data systems. The landing page has a section talking about some of the patterns it enables (Decouple / Buffer / Journal) in a serverless manner, with example use cases. It just may not be something that resonates with you though! We are interested in adoption by developers for now.
- jcrites 21 hours ago
  
  I think they're saying that you should provide some example use-cases for how someone would use your service. High-level use-cases that involve solving problems for a business.
  For what it's worth, I am already familiar with this design space well enough that I don't need this kind of example in order to understand it. I've worked with Kinesis and other streaming systems before. But for people who haven't, an example might help.
  What kind of business problem would someone have that causes them to turn to your service? What are the alternative solutions they might consider and how do those compare to yours? That's the kind of info they're asking for. You might benefit from pitching this such that people will understand it who have never considered streaming solutions before and don't understand the benefits. Pitch it to people who don't even realize they need this.
  - shikhar 19 hours ago
    
    (Founder) Yes I understand, and this could definitely do with work. I struggle with it personally because it is so obvious to me. I don't even know where to start? How do you pitch use cases for object storage? Stream storage feels just as universal to me.
8n4vidtmkvmk a day ago

If you ever figure it out, LMK. I don't think I've ever looked at logs more than about 24 hours old. Persistence and durability is not something I care about.
Errors, OTOH, I need a week or two of. But I consider these 2 different things. Logs are kind of a last resort when you really can't figure out what's going on in prod.
rswail 8 hours ago

"Replace our MSK clusters and EBS storage with S3 storage costs."

johnrob a day ago

This is a very interesting abstraction (and service). I can’t help but feature creep and ask for something like Athena, which runs PrestoDB (map reduce) over S3 files. It could be superior in theory because anyone using that pattern must shoehorn their data stream (almost everything is really a stream) into an S3 file system. Fragmentation and file packing become requirements that degrade transactional qualities.

shikhar 19 hours ago

(Founder) There are definitely some interesting possibilities. Pretty hyped about S3 Table (Iceberg) buckets. S2 stream to buffer small writes so you can flush decent size Parquet into the table, and avoid compaction costs.

bdcravens a day ago

My first thought: "introducing? The S2 has been out for a while!"

https://www.sunlu.com/products/new-version-sunlu-filadryer-s...

01HNNWZ0MV43FF a day ago

Google had it years ago! http://s2geometry.io/devguide/s2cell_hierarchy

nextworddev 18 hours ago

This is cool but I think it overlaps too much with something like Kinesis Data Streams from AWS which has been around for a long time. It’s good that AWS has some competition though

shikhar 17 hours ago

(Founder) We plan to be multi-cloud over time. Kinesis has pretty low ordered throughput limit (i.e. at the level of a stream shard) of 1 MBps, if you need higher. S2 will be cheaper and faster than Kinesis with the Express storage class. S2 is also a more serverless pricing model - closer to S3 - than paying for stream shard hours.
- nextworddev 17 hours ago
  
  Thanks. You are right about those points. One thing to probably consider is whether serverless provides enough cost savings for most streaming ingest use cases which need static provisioning since ingest volumes are unpredictable. A better messaging would be that your serverless model can handle bursts well. (for context: used to sell KDA and KDS at AWS as part of AI solutions)

nyclounge 13 hours ago

How is this compare to https://github.com/deuxfleurs-org/garage ?

Seems like there are a lot of more lite weight self-hosted s3 around now days. Why even use S3?

jcmfernandes a day ago

In the long-term, how different do you want to be from Apache Pulsar? At the moment, many differences are obvious, e.g., Pulsar offers transactions, queues and durable timers.

shikhar a day ago

(Founder) We want S2 to be focussed on the stream primitive (log if you prefer). There is a lot that can be built on top, which we mostly want to do as open source layers. For example, Kafka compatibility, or queue semantics.

bawolff 20 hours ago

In terms of a pitch, i'm not sure i understand how this differs from existing solutions. Is the core value proposition a simpler api?

shikhar 19 hours ago

(Founder) Besides simple API,
- Unlimited streams. Current cloud systems limit to a few thousand. With dedicated clusters, few hundred K? If you want a stream per user, you are now dealing with multiple clusters.
- Elastic throughput per stream (i.e. a partition in Kafka) to 125 MiBps append / 500 MiBps realtime read / unlimited in aggregate for catching up. Current systems will have you at tens. And we may grow that limit yet. We are able to live migrate streams in milliseconds while keeping pipelined writes flowing, which gives us a lot of flexibility.
- Concurrency control mechanisms (https://s2.dev/docs/stream#concurrency-control)
- shikhar 19 hours ago
  
  Forgot to mention storage classes to tune your latency vs cost tradeoff. That you can even reconfigure - soon we will make that a live migration.

adverbly a day ago

Seems really good for IoT no? Been a while since I worked in that space, but having something like this would have been nice at the time.

shikhar 21 hours ago

(Founder) so many possibilities! That's what I love about building building blocks. I think we will create an open source layer for an IoT protocol over time (unless community gets to it first), e.g. MQTT. I have to admit I don't know too much about the space.

rswail 8 hours ago

Really interesting service and bookmarked.

I'd really love this extending more into the event sourcing space not just the log/event streaming space.

Dealing with problems like replay and log compaction etc.

Plus things like dealing with old events. Under GDPR, removing personal information/isolating it from the data/events themselves in an event sourced system are a PITA.

shikhar 4 hours ago

(Founder) An S2 stream is a durable log and can be replayed! We do want to add compaction support. Event sourcing is a great use case for S2.

ComputerGuru a day ago

So is this a "serverless" named-pipe-as-a-service cloud offering? Or am I misreading?

38 a day ago

Yep. Just tack "serverless" onto something that already exists and charge for it
- shikhar a day ago
  
  (Founder) Named pipe that operates at the level of records, is durable regionally, you can read from any sequence, and lets you do concurrency control for writes if you need to.

zffr 12 hours ago

How does this compare to Kafka? Is the primary difference that this is a hosted solution?

unsnap_biceps 19 hours ago

I really liked the landing page and the service, but it took me a while to realize it wasn't a AWS service with a snazzy landing page.

siliconc0w 19 hours ago

Definitely a useful API but not super compelling until I could store the data in my own bucket

tdba a day ago

Is it possible to bring my own cloud account to provide the underlying S3 storage?

shikhar a day ago

(Founder) Not currently! We want to explore this.

kdazzle a day ago

Would this be like an alternative to Delta? Am I thinking about that right?

BaculumMeumEst a day ago

S2 is, in my opinion, the sweet spot of PRS's lineup.

ThinkBeat 18 hours ago

This would sell much better is was S5 or S6 next level thing.

Wow man are you stil stuck on S3?

behnamoh a day ago

so the naming convention for 2024-25 products seems to be <letter><number>.

o1, o3, s2, M4, r2, ...

somerando7 17 hours ago

Scribe aaS? ;)

locusofself 18 hours ago

"Making the world a better place through streamable, appendable object streams"

aorloff 19 hours ago

Kafka as a service ?

shikhar 18 hours ago

(Founder) Nope! We have a FAQ for this ;)

ms7892 a day ago

Can someone tell me what does this do? And why its better.

shikhar a day ago

(Founder) There is a table on the landing page https://s2.dev/ which hopefully gives a nice overview :) It's like S3, but for streams. Cheap appends, and instead of dealing with blocks of data and byte ranges, you work with records. S2 takes care of ordering records, and letting you read from anywhere in the stream.
This is an alternative to systems like Kafka which don't do great at giving a serverless experience.
- stogot a day ago
  
  Could you clarify the Kafka difference further?
  Or more generally, when is it better to choose S2 vs services like SQS or Kinesis?
  S2 sounds like an ordered queue to me, but those exist?
  - shikhar a day ago
    
    (Founder here) Managed cloud offerings for streaming limit ordered throughput pretty low, e.g. Kinesis at 1 MiBps, Redpanda serverless at 1 MiBps, Confluent's even higher-end clusters at 10-20 MiBps IIRC. If you really need ordering, this can indeed be a limit. S2 lets you push 125 MiBps currently, and we may grow that.
    Another factor is how many ordered streams you can have. Typically a few thousand at most with those systems. We take the serverless spirit of S3 here, when did you have to worry about the number of objects in a bucket?
    We are also able to offer latency comparable to disk-based streaming like Confluent's Kora and Kinesis, with our Express storage class (under 50 milliseconds end-to-end latency for client in the same cloud region) - while being backed by S3 with regional durability! Not a disk in the system.
    We want people to be able to build safe distributed data systems on top of S2, so we also allow concurrency control mechanisms on the stream like fencing. Kafka or Kinesis won't let you do that. This is the approach AWS takes internally (https://brooker.co.za/blog/2024/04/25/memorydb.html), but they don't have that as a service. We want to democratize the pattern.
    ED: on throughtputs, to clarify, I am talking about ordered throughput, i.e. per Kafka partition or Kinesis shard. WarpStream also does well here because of their architectural approach to separate ordering, but at a latency cost.
    
    alwa a day ago
    
    Between your site copy and your early comments on this thread, it was this rundown that made the product click in my mind.
    I’m sure that in this early preview you’re trying to reach mainly devs with existing domain expertise, but the way that, in this comment, you laid out existing constraints and what possibilities might lie beyond them—it really helped me situate your S2 product in the constellation of cloud primitives.
    Just wanted to offer that feedback in the hope that the spirit of your comment here doesn’t get buried down-thread!
    
    shikhar a day ago
    
    thank you for the feedback!
    
    yandie 11 hours ago
    
    Hey congrats! Looks like a really cool idea.
    Looks like you're pushing for the throughput angle - that could be important but IMO it's not often you come across devs who need this level of throughput without dealing with large scale problem. My feedback is the lack of per-tenant encryption is a big deal breaker here since you're mixing up data of tenants within one objects.
    Plus your security section talks very little how you prevent cross data contamination - that's probably first thing that popped up in my mind when I read about your data model. It makes me extremely uneasy - and can't imagine that I can adopt this for anything serious. I would encourage you to think about how you can communicate that angle to the customer as well, besides supporting per tenant encryption key.
    
    shikhar an hour ago
    
    (Founder) It's a number of dimensions. I get excited about the ordered throughput angle because I have personally cared about this in the past, and yeah a lot of folks may not need that :)
    Simple API, reasonable pricing, latency flexibility, unlimited streams, _and_ elastic to high throughputs. All adding up to a great serverless experience.
    Re: the data colocation. This is how most multi-tenant systems - including S3 itself AFAIU - operate. I understand there is a difference in level of trust vs a cloud provider, and the best we can do here while delivering a serverless experience is encrypting every single record at the edge of S2 where they transit in or out, with a tenant-specific key. We may even allow specifying it as part of the request, if clients want to manage the key for themself.
    The best data security when leveraging any multi-tenant service is going to be client-side encryption, and we also want to make this super easy. With our planned Kafka layer, we plan on client-side encryption as a value add.
    
    shikhar a day ago
    
    @agallego Yes in aggregate both Confluent and Redpanda can push GiBps throughputs, and I know Redpanda has amazing perf. I was referring to Redpanda Serverless :) And per-partition i.e. ordered throughput.
    ED: for some reason I wasn't seeing the reply link before on your comment, do see it now.
    
    agallego 20 hours ago
    
    coo cool right on.
    
    agallego a day ago
    
    Redpanda cloud doesn’t limit tput. Most ppl get a bigger discount at high volumes. We have customers in 10s of GB/s. Confluent has those volumes too.
alanfranz a day ago

Sort of serverless Kafka, which natively uses object storage and promises better latencies than things like warpstream.
- ivankelly 21 hours ago
  
  A interesting difference is the ability to have exclusive access to writes on the log (the fencing token). This allows you to use the logs as write ahead logs.
moralestapia a day ago

It's a message queue on the cloud.
https://chatgpt.com/c/676703d4-7bc8-8003-9e5d-d6a402050439
Edit: Keep downvoting, only 5.6k to go!
- ms7892 a day ago
  
  Thank you

durkie a day ago

[flagged]

shikhar a day ago

Indeed... we sure wish we could have nabbed that crate name, but it was not to be. Our Rust SDK is here https://lib.rs/crates/streamstore
- ISV_Damocles a day ago
  
  Replying to this one since you apparently can't reply to a comment that has been flagged. Why was the grandparent flagged? Google's S2 library has been around for more than a decade and is the first thing I think of when I see "S2" in a tech stack.
  And the flippant response from the parent here that they don't really care that they're muddying the waters and just want the crate name is irksome.

revskill a day ago

Serverless pricing to me is exactly like the ETH gas pricing !