Work Here?

Nearform

Work Here?

Claim Your Company

AI-enabled software solutions for enterprises

Website

Nearform

Work Here?

Claim Your Company

AI-enabled software solutions for enterprises

Website

About Nearform

Simplify's Rating

Why Nearform is rated

C+

Rated C on Competitive Edge

Rated B on Growth Potential

Rated C on Differentiation

Industries

Data & Analytics

Consulting

Enterprise Software

AI & Machine Learning

Company Size

201-500

Company Stage

N/A

Total Funding

N/A

Headquarters

Waterford, Ireland

Founded

2011

Overview

Nearform is a team of data and AI specialists, engineers, and designers who build intelligent digital solutions at pace. They create AI-enabled products that improve digital experiences, empower developers, and deliver measurable results for enterprises. Their approach combines deep expertise in solving complex digital problems with a collaborative, people-first ethos to help organizations modernize legacy systems and develop breakthrough products by leveraging AI. Their goal is to partner with ambitious enterprises to produce enduring impact and measurable business outcomes.

Simplify's Take

What believers are saying

Formidable acquisition adds 100 employees, strengthening North American delivery leadership.
Serves Lululemon, Puma, Starbucks, Walmart, proving AI scalability in retail sectors.
OpenSSF grant renewal enhances Node.js security credibility for enterprise clients.

What critics are saying

LangChain outperforms llm-splitter's greedy splitting, eroding adoption within 12 months.
Formidable integration triggers talent exodus and cultural clashes by mid-2027.
OpenAI tiktoken integration obsoletes llm-splitter in JavaScript SDK by 2027.

What makes Nearform unique

Nearform released llm-splitter, lightweight JavaScript text chunker for LLM embeddings.
Nearform acquired Formidable on October 10, 2023, expanding US design expertise.
Nearform launched Initium on September 8, 2023, accelerating enterprise software bootstrapping.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Annual Company Bonus

Remote Work Options

Paid Vacation

Remote Working Allowance

Training and Development Allowance

Healthcare

401(k) Company Match

Company News

NearForm

Oct 7th, 2025

Introducing llm-splitter - a fast, lightweight, text chunker for embeddings, LLMs, and more!Introducing llm-splitter - a fast, lightweight, text chunker for embeddings, LLMs, and more!

Introducing llm-splitter - a fast, lightweight, text chunker for embeddings, llms, and more!introducing llm-splitter - a fast, lightweight, text chunker for embeddings, llms, and more! Discover how Nearform's new llm-splitter tool makes text chunking for embeddings and LLM applications easier to integrate into real-world AI workflows. NearForm Ltd develop impactful AI applications at Nearform, each underpinned by rigorous data insights and real-world utility. NearForm Ltd regularly wrangle enormous text-based document stores, that NearForm Ltd process, normalize, and refine into usable formats for upstream usage (e.g., supporting similarity search or providing specific context to LLM-based applications). In its AI apps, NearForm Ltd is typically working with vector embeddings - numerical representations of text inputs which can be used to retrieve semantically similar data. A fundamental (if seemingly boring) task along the way is dividing large text documents into smaller, more manageable pieces. It's tricky, chunks that are too large lose specificity and precision in similarity calculations, while overly small chunks lose contextual relationships that inform meaning. Separately, embedding models often have limits that require sending smaller parts of a document. Thus, deciding how to slice text for maximum semantic usefulness in an upstream application is a matter of both science and art as applied to specific use cases. NearForm Ltd is going to skip over the arc of determining the optimal chunking strategy and focus on the practical end of things - once you know your chunking strategy, what is the best tool to transform your text into the right-sized chunks for your application? Just looking in the JavaScript ecosystem, there are a number of reliable text chunking solutions. For a quick introduction to several libraries, take a peek at Phil Nash's "How to Chunk Text in JavaScript for Your RAG Application" and then check them out at this online demo. To better understand this space, let's look at two popular open source libraries: The popular LangChain project provides a text splitting library with a rich set of different splitters. Some particularly useful ones include the RecursiveCharacterTextSplitter which preserves language structure and TokenTextSplitter which uses the OpenAI-style tiktoken library to split text into appropriate chunks. LlamaIndex is another popular project that provides a TypeScript framework with various text parsers and splitters. Some useful ones here include SentenceSplitter for splitting text into sentences, MarkdownNodeParser for handling markdown text, and CodeSplitter, for - you guessed it - source code. These two libraries are powerful, flexible, and fantastic to use if you're already using the framework behind them in your application. But they're also quite heavy - @langchain/textsplitters brings 21MB of dependencies when installed, and llamaindex is even larger with a 36MB node_modules install impact. There are many other excellent open source JavaScript text chunking libraries - and its team evaluated them across library size, chunking options (size, overlap, LLM token flexibility, paragraph/sentence support), execution speed and other criteria. But after much research and tire-kicking, NearForm Ltd found that the collection of features NearForm Ltd wanted most weren't available from a single open source JavaScript library. That gap motivated NearForm Ltd to create a new entrant in the text processing ecosystem, which NearForm Ltd is pleased to share with the community - llm-splitter. Meet llm-splitter, a small and speedy alternative. While llm-splitter is intentionally minimal, it's also built for extensibility and storage efficiency. Need model-specific chunking? Just plug in libraries like tiktoken to apply token-based outputs tailored to specific models. Now, in its JavaScript code NearForm Ltd can get to work with a simple example: Flexible options for your text-slicing use cases With that introduction, let's dive into the various options for split: The default is that each character becomes a "token" when creating chunks. So in the example above, tokens are each character, which we assemble into chunks of 10 characters. If we switched to something like a naive word-splitter (e.g., (text) => text.split(/\\s+/).filter(Boolean)), then each token would be a word ("Hello", "world") of varying character sizes. A common use case for OpenAI embedding models is counting tokens with tiktoken to ensure inputs fit within model context limits, preventing embedding API calls failures. llm-splitter works well for this purpose - NearForm Ltd even have a specific example for plugging tiktoken into a splitter function! This defaults to 512 tokens, but it's important to note that your token sizes change with the splitter (e.g. character vs. word-based), so adjust accordingly. Number of tokens to include from the previous chunk for overlap. Either character (default) or paragraph. For the character strategy, text will be split into as many tokens as can fit in a chunk without any other considerations. By contrast, the paragraph strategy first splits text into paragraphs (at the \n\n boundary) and then when assembling chunks, won't include tokens from a new paragraph in an existing chunk unless the entire paragraph can fit. Thus, NearForm Ltd can better preserve the semantic meaning of paragraphs more well encapsulated in chunks by reducing partial paragraphs in output chunks. Let's put these options together in an example where NearForm Ltd'll take a string, split it into word tokens, group into paragraphs and then aim to get up to 15 words in a chunk with 2 words of overlap from the previous chunk. Note that llm-splitter takes a greedy approach that performs a single pass on the input for speed. This means that the chunks created might not be the optimal set like LangChains's RecursiveCharacterTextSplitter. But the approach is simple and fast - and that's the tradeoff NearForm Ltd make. Examining its outputs here, NearForm Ltd can see that the first chunk has fewer than 15 words, but couldn't fit all words from the second paragraph in that same chunk, so started the second paragraph word tokens in the next chunk. NearForm Ltd can also see the overlap of two words in the second chunk from the previous chunk. The output format has two key parts - the text portion of the entry is the chunk text, similar to all other chunking libraries. The second portion, start and end, are unique features to llm-splitter - they represent positional data that can recreate the exact text in the text field when given the original input. Using this approach, NearForm Ltd can look at its last chunk and recreate it from only the start and end with getChunk: The start/end index data feature can be useful for situations in which you want to store a chunk value - typically an embedding numerical array derived from the chunk - but you don't want to store the actual text from the chunk. An exemplary situation for this scenario would be an application that needs full documents in a data store, and separately uses chunks for relevance search and context for an LLM knowledge application. For example, the documents are stored in a source of truth like S3 and the embeddings are stored in a dedicated vector store like pgvector, Pinecone, AstraDB, etc. In cases where storing the actual chunk text is duplicative and unnecessary, llm-splitter allows storage of much smaller data (two integer indexes) instead of variable-length text data. For use cases requiring efficient storage without text duplication, llm-splitter provides an ideal solution. Go try it out! llm-splitter isn't fancy or complicated, but it offers a solid set of base options with the flexibility to cover most text chunking needs, especially for creating embeddings for use in AI applications. You can find the source code on GitHub at: https://github.com/nearform/llm-splitter. You can also see it in action on its demo page: https://nearform.github.io/llm-splitter. At Nearform, NearForm Ltd love both building AI applications and contributing back to the open source community. NearForm Ltd hope that llm-splitter provides the features and flexibility you need for the next time you process large volumes of text data. NearForm Ltd'd love to hear how your usage is going - send NearForm Ltd an issue if you're running into any challenges or reach out to NearForm Ltd if you're looking for expert help with shipping your AI applications.

Arclabs Welding School

Nov 6th, 2024

PREVIOUS CLIENTS - ArcLabs Research & Innovation Centre | Startups | Innovation | Entrepreneurs

Explore the success stories of previous clients at ArcLabs. Discover a diverse array of innovative companies that have thrived in our dynamic environment, from tech startups to creative agencies. Learn how ArcLabs has supported their growth and development, and see the impact of our collaborative community.

NearForm

Dec 8th, 2023

Introducing Mercurius Dynamic Schema

NearForm Limited is pleased to introduce Mercurius Dynamic Schema, a new package by NearForm.

PR Newswire

Oct 10th, 2023

NearForm Acquires Formidable To Expand Global Software Offering

/PRNewswire/ -- NearForm, an Irish-founded team of engineers, designers and strategists who build digital capability and software solutions for enterprises,...

Procurement Magazine

Oct 10th, 2023

NearForm Acquires Formidable To Expand Global Software Offering

DUBLIN, Oct. 10, 2023 /PRNewswire/ -- NearForm, an Irish-founded team of engineers, designers and strategists who build digital capability and software solutions for enterprises, has acquired Formidable, a global design and engineering consultancy founded in Seattle, Washington.

Recently Posted Jobs

Nearform is Hiring for 24 Jobs on Simplify!

Find jobs on Simplify and start your career today

Don't see your dream role? Check out thousands of other roles on Simplify. Browse all jobs →

About Nearform

Simplify's Rating

Why Nearform is rated

C+

Rated C on Competitive Edge

Rated B on Growth Potential

Rated C on Differentiation

Industries

Data & Analytics

Consulting

Enterprise Software

AI & Machine Learning

Company Size

201-500

Company Stage

N/A

Total Funding

N/A

Headquarters

Waterford, Ireland

Founded

2011

Recently Posted Jobs

Nearform is Hiring for 24 Jobs on Simplify!

Find jobs on Simplify and start your career today

Don't see your dream role? Check out thousands of other roles on Simplify. Browse all jobs →