Full-Time

Vision Language Model Engineer

EchoTwin AI

EchoTwin AI

11-50 employees

AI vision sensors for urban analytics

No salary listed

San Francisco, CA, USA

In Person

Category
AI & Machine Learning (2)
,
Required Skills
Microsoft Azure
Python
Tensorflow
Neural Networks
Pytorch
AWS
OpenCV
Google Cloud Platform
Requirements
  • Bachelor’s, Master’s or Ph.D. in Computer Science, Machine Learning, Artificial Intelligence, or a related field (or equivalent experience).
  • 3+ years of experience in machine learning, with a focus on vision-language models or multimodal AI.
  • Hands-on experience with deep learning frameworks such as PyTorch or TensorFlow.
  • Proven track record of building and deploying computer vision and/or NLP models.
  • Proficiency in Python and relevant ML libraries (e.g., Hugging Face, OpenCV, Transformers).
  • Experience with large-scale model training and optimization (e.g., distributed training, quantization).
  • Strong understanding of neural network architectures (e.g., CNNs, Transformers, CLIP, or similar).
  • Experience with multimodal datasets and preprocessing techniques for images and text.
  • Familiarity with cloud platforms (e.g., AWS, GCP, Azure) and model deployment workflows.
  • Strong problem-solving skills and ability to work in a fast-paced, collaborative environment.
  • Excellent communication skills to explain complex technical concepts to diverse audiences.
Responsibilities
  • Design and implement state-of-the-art vision-language models using deep learning frameworks.
  • Develop and fine-tune models that combine computer vision and natural language processing for tasks like image captioning, visual question answering, and text-to-image generation.
  • Collaborate with data scientists and software engineers to integrate models into production systems.
  • Optimize model performance for accuracy, latency, and scalability in real-world applications.
  • Conduct experiments to evaluate model performance and iterate on architectures and training pipelines.
  • Stay up-to-date with the latest research in vision-language models and incorporate advancements into projects.
  • Contribute to data preprocessing, augmentation, and annotation pipelines for multimodal datasets.
  • Document model development processes and present findings to technical and non-technical stakeholders.

EchoTwin AI sells and implements AI-powered vision sensor systems for vehicles, drones, and urban platforms to collect real-time visual and environmental data in cities. Its CityView product uses computer vision and natural language understanding to help urban managers make informed decisions, automate compliance monitoring, and scale urban intelligence across streets, sidewalks, and service areas. The company combines sensors, AI analytics, and a services bundle that covers consulting, strategy, implementation, deployment, ongoing support, optimization, and training. It partners with local networks to deliver its platform globally. Revenue comes from product sales, professional services fees, and licensing of its patented technologies.

Company Size

11-50

Company Stage

Seed

Total Funding

$8M

Headquarters

Dubai, United Arab Emirates

Founded

2024

Simplify Jobs

Simplify's Take

What believers are saying

  • Global smart cities market projected to reach $4 trillion by 2030 with massive expansion potential.
  • Active pilot projects across US, Europe, and Middle East validate platform for real-world municipal deployment.
  • Secured $8M seed funding from Metis Ventures and strategic investors including Automotive Ventures and Supernova.

What critics are saying

  • NVIDIA Metropolis and OpenAI multimodal models commoditize proprietary vision-language capabilities within 12-18 months.
  • UAE data sovereignty regulations force 40% higher server costs or loss of government pilot contracts.
  • Huawei CityBrain dominates Middle East and Asia municipal contracts, eroding EchoTwin's pilot pipeline.

What makes EchoTwin AI unique

  • Proprietary visual intelligence engine with full spatial reasoning for autonomous issue detection and resolution.
  • Agentic AI workflows automate compliance monitoring from detection through regulatory follow-up without human intervention.
  • Deep active learning in vision-language models achieves superior anomaly detection rates versus commodity vision platforms.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Health Savings Account/Flexible Spending Account

401(k) Retirement Plan

401(k) Company Match

Unlimited Paid Time Off

Profit Sharing

Company News

Ars Technica
Mar 6th, 2026
Ex-CEO of $464M AI startup accused of forging board signatures, selling $1.2M in unauthorised stock

Hayden AI, a San Francisco startup valued at $464 million, has sued co-founder and former CEO Chris Carson, alleging he stole 41GB of proprietary data before his September 2024 termination. The company claims Carson engaged in fraud including forged board signatures and unauthorised stock sales. According to the lawsuit, Carson secretly sold over $1.2 million in company stock without board approval to fund a multimillion-dollar Florida home and luxury purchases including a gold Bentley Continental. He allegedly downloaded his entire email file, containing proprietary information, days before launching rival firm EchoTwin AI. Hayden AI further alleges Carson fabricated his professional credentials, including a PhD from Waseda University. The complaint states that in 2007, Carson was actually operating a paintball equipment business in a Florida strip mall, not completing doctoral studies.

Webrazzi
Sep 24th, 2025
EchoTwin AI secures $8M funding round

EchoTwin AI, an Abu Dhabi-based company developing AI-driven urban infrastructure solutions, secured $8 million in seed funding led by Metis Ventures. Participants included Automotive Ventures, Supernova, Plug and Play, Higher Life Ventures, and Tesserakt Ventures. EchoTwin AI's platform helps municipalities manage cities more efficiently by transforming vehicle fleets into real-time AI-supported sensors, creating a digital twin of cities to proactively address infrastructure issues.