This is some text inside of a div block.
Novus Voices

LLM Benchmarking: Understanding the Landscape and Limitations

LLM benchmarking evaluates large language models. Novus combines benchmarks and human testing for effective, ethical AI models.

July 3, 2024
Read more

In the field of artificial intelligence, Large Language Models (LLMs) have become increasingly prevalent and powerful. As organizations and developers seek to harness the potential of these models, the need for reliable methods to evaluate and compare their performance has never been more critical. This is where LLM benchmarking comes into play.

What are LLM Benchmarks?

LLM benchmarks are standardized performance tests designed to evaluate various capabilities of AI language models. Typically, a benchmark consists of a dataset, a collection of tasks or questions, and a scoring mechanism. After evaluation, models are usually awarded a score from 0 to 100, providing an objective indication of their performance.

The Importance of Benchmarking

Benchmarks serve several crucial purposes in the AI community:

  • Objective Comparison: They provide a common ground for comparing different models, helping organizations and users select the best model for their specific needs.
  •  Performance Insight: Benchmarks reveal where a model excels and where it falls short, guiding developers in making necessary improvements.
  • Advancement of the Field: The transparency fostered by well-constructed benchmarks allows researchers and developers to build upon each other's progress, accelerating the overall advancement of language models.

Popular LLM Benchmarks

Several benchmarks have emerged as standards in the field. Here's a brief overview of some key players:

1. ARC (AI2 Reasoning Challenge): Tests knowledge and reasoning skills through multiple-choice science questions.

2. HellaSwag: Evaluates commonsense reasoning and natural language inference through sentence completion exercises.

3. MMLU (Massive Multitask Language Understanding): Assesses a broad range of subjects at various difficulty levels.

4. TruthfulQA: Measures a model's ability to generate truthful answers and avoid hallucinations.

5. WinoGrande: Evaluates commonsense reasoning abilities through pronoun resolution problems.

6. GSM8K: Tests multi-step mathematical reasoning abilities.

7. SuperGLUE: A collection of diverse tasks assessing natural language understanding capabilities.

8. HumanEval: Measures a model's ability to generate functionally correct code.

9. MT Bench: Evaluates a model's capability to engage in multi-turn dialogues effectively.

Limitations of Existing Benchmarks

While benchmarks provide valuable insights, they are not without their limitations. Understanding these constraints is crucial for interpreting benchmark results accurately:

1. Influence of Prompts: Performance can be sensitive to specific prompts, potentially masking a model's true capabilities.

2. Construct Validity: Establishing acceptable answers for diverse use cases is challenging due to the broad spectrum of tasks involved.

3. Limited Scope: Most benchmarks evaluate specific tasks or capabilities, which may not fully represent a model's overall performance or future skills.

4. Insufficient Standardization: Lack of standardization leads to inconsistencies in benchmark results across different evaluations.

5. Human Evaluation Challenges: Tasks requiring subjective judgment often rely on human evaluations, which can be time-consuming, expensive, and potentially inconsistent.

6. Benchmark Leakage: There's a risk of models being trained on benchmark data, leading to artificially inflated scores that don't reflect true capabilities.

7. Real-World Application Gap: Benchmark performance may not accurately predict how a model will perform in unpredictable, real-world scenarios.

8. Specialization Limitations: Most benchmarks use general knowledge datasets, making it difficult to assess performance in specialized domains.

The Future of LLM Benchmarking

As the field of AI continues to advance, so too must our methods of evaluation. Future benchmarks will likely need to address current limitations by:

  • Developing more comprehensive and diverse datasets,
  • Creating tasks that better simulate real-world applications,
  • Incorporating ethical considerations into evaluations,
  • Improving standardization across the field,
  • Exploring ways to assess specialized domain knowledge.

LLM Benchmarks at Novus

LLM benchmarks play a crucial role in advancing our field of artificial intelligence by providing objective measures of model performance. However, at Novus, we understand the importance of approaching benchmark results with a critical eye, recognizing both their value and limitations.

We ensure that all of our models are extensively evaluated on a variety of benchmarks, including different in-house assessments. This comprehensive approach allows us to gain a nuanced understanding of our models' capabilities. Importantly, we don't stop at traditional performance metrics. We also place a strong emphasis on evaluating the safety and alignment of these models, recognizing the ethical implications of deploying powerful AI systems.

While we believe that benchmarks provide valuable insights, we know they don't tell the whole story when it comes to determining the quality of these models. That's why we complement our benchmark evaluations with extensive human testing. This hands-on approach ensures that we can assess the real-world applications and practical usefulness of our models.

As we continue to push the boundaries of what's possible with language models at Novus, we're committed to evolving our evaluation methods in tandem. 

Our goal is to develop and refine assessment techniques that allow us to accurately gauge and harness the full potential of these powerful tools, always keeping in mind their practical impact and ethical considerations.

This is some text inside of a div block.
Newsletter

Novus Newsletter: AI Highlights - June 2024

June 2024 Newsletter: AI updates, including Safe Superintelligence Inc., Google AI critiques, and Novus at global tech events.

July 1, 2024
Read more

Hey there!

Duru here from Novus, excited to bring you the highlights from our June AI newsletters. As summer unfolds, the world of artificial intelligence continues to captivate with groundbreaking developments and pivotal discussions on the ethical integration of AI in our daily lives.

In each newsletter, I find the most interesting AI news for you and of course keep you up to date with the latest insights and developments. Here, I have compiled the key stories and updates from June 2024 to keep you informed and engaged.

If you want to stay more up-to-date with what's happening in the AI field, you can subscribe to our bi-weekly newsletter. You will receive the latest updates and exclusive insights directly to your inbox.

Now, let's jump in!

AI NEWS

Launching Safe Superintelligence Inc.

Ilya Sutskever has initiated Safe Superintelligence Inc., focusing on creating AI that surpasses human intelligence but is safe for human coexistence. This company emphasizes ethical AI development to prevent potential future risks.

Key Point: Sutskever advocates for AI that not only enhances human capabilities but also prioritizes safety and ethical considerations.

Further Reading: Safe Superintelligence Inc.

Claude 3.5 Sonnet: A New Benchmark

Anthropic has introduced Claude 3.5 Sonnet, a language model surpassing previous iterations in speed and intelligence, aimed at enhancing how we interact with and utilize AI.

Key Point: Claude 3.5 Sonnet promises groundbreaking improvements in language processing, setting a new standard for AI capabilities.

Further Reading: Claude 3.5 Sonnet Release

Google AI Reviews: Comedy or Concern?

The AI Review feature by Google aimed to simplify search results but ended up providing humor due to its inaccurate summaries, highlighting the current limits of AI in understanding complex human queries.

Key Point: This feature's mishaps underscore the challenges in deploying AI that accurately interprets and summarizes diverse data types.

Further Reading: Google AI Reviews

Celebrity and AI: The Scarlett Johansson Controversy

Recent developments in AI voice technology have sparked discussions about ethical implications, highlighted by Scarlett Johansson's concerns over the unauthorized use of her voice likeness in AI applications.

Key Point: Johansson's case raises important questions about consent and the ethical use of celebrity likenesses in AI.

Further Reading: Scarlett Johansson AI Voice Controversy

Apple's Subtle AI Integration Strategy

Apple continues to integrate AI into its existing product lineup, focusing on enhancing functionality without overwhelming users with new technologies, aligning with practical and user-friendly AI applications.

Key Point: Apple's strategy focuses on improving user experience through subtle, yet effective AI enhancements rather than flashy new AI products.

Further Reading: Apple's AI Strategy

Novus Uptades

Our Ceo, Egehan at Bridgevent

Vorga's Paris Journey

During Viva Technology in Paris, our CRO, Vorga, showcased Novus' latest AI innovations. This event provided a platform for networking with industry leaders and highlighted our commitment to pushing the boundaries of AI technology. Additionally, Vorga represented Novus at La French Tech event, where he demonstrated our cutting-edge solutions to an enthusiastic French tech audience.

Overcoming Barriers: Fundraising Processes

At the Bridgevent organized by Inveo Ventures, our CEO, Egehan, participated in a panel discussing the intricacies of fundraising in the tech sector. Insights were shared on overcoming challenges and strategizing effectively to secure funding, highlighting Novus' proactive approach in navigating the complex investment landscape.

Artificial Intelligence, Data Science, and Sustainability

Our commitment to sustainability was underlined at a community gathering with MAP360, where our CRO, Vorga, discussed the intersection of AI, data science, and environmental sustainability. This conversation explored how AI can be leveraged to foster sustainable practices and mitigate environmental impacts, reinforcing our dedication to responsible AI development.

Our CRO, Vorga at MAP360 Community Gathering Event

Educational Insights from Duru’s AI Learning Journey

And I started to write a new section called Duru’s AI Learning Journey where  I share my review on a piece of content about AI that I have read or watched in that week.

Reflecting on AI in Marketing

In this segment, I delved into an article discussing AI's evolving role in marketing. The article emphasized the cost-saving potential of AI but missed the critical element of human connection. I argued for a balanced approach where AI enhances our ability to engage genuinely with customers, rather than replacing the human touch.

The Article: How AI will reinvent Marketing

Mind-Controlled Gaming

I also explored the fascinating world of mind-controlled gaming through a YouTube video featuring a streamer who plays games using only their thoughts. This review highlighted how AI and brain-computer interfaces can transform our interaction with digital worlds, making gaming more inclusive and futuristic by translating mental commands into in-game actions.

The Youtube Video: I Made a Mind-Controlled Game Controller

Looking Forward

We eagerly anticipate sharing more news and insights as we continue exploring the dynamic field of AI. Stay connected for more updates, and thank you for being an integral part of our journey at Novus.

Subscribe to our newsletter.

This is some text inside of a div block.
Newsroom

Novus Participates in Sustainability Discussions at MAP360's ‘’Community Gathering’’ Event

At MAP360's event, Novus' CRO discussed AI, data science, and sustainability, highlighting AI's future and startups' efficiency.

June 12, 2024
Read more

We had the pleasure of attending the "Community Gathering" event organized by our sustainability partner, MAP360.

The event featured a series of enlightening panels, with our CRO, Vorga Can, having the honor of participating in the first panel titled “Artificial Intelligence, Data Science, and Sustainability,” moderated by Özgün İnceoğlu, CEO of MAP360. Alongside Ömer Kavlakoğlu, Business Development Manager at Evreka, Vorga shared insights on the future of artificial intelligence, the essential role of digitalization in sustainability, and the efficiency advantages that startups hold over larger companies.

We thoroughly enjoyed meeting sustainability leaders from various sectors, exchanging ideas, and listening to their inspiring stories throughout the event. The discussions underscored the importance of collaboration and innovation in driving sustainable practices forward.

A heartfelt thank you to the MAP360 team for organizing such an enjoyable and informative event and for inviting us. Meeting people who are also dedicated to sustainability was truly rewarding.

We are also very excited to announce our upcoming sustainability projects with MAP360 in the near future.

This is some text inside of a div block.
Newsroom

Novus at La French Tech Istanbul's First Event in Izmir

Novus attended La French Tech in Izmir, where CRO Vorga Can presented our AI solutions and met French Ambassador H.E. Ms. Dumont.

June 7, 2024
Read more

La French Tech events are always very valuable for us.

This time, we were excited to attend La French Tech Istanbul's first event in Izmir.

As one of the three startups presenting at the event, our CRO, Vorga Can, had the honor of sharing the stage with Bora Çitiloğlu from TEB. Vorga delivered an insightful presentation about Novus, highlighting our innovative AI solutions and the impact we're making in the industry.

French Ambassador H.E. Ms. Isabelle Dumont.

Vorga Can also had the unique opportunity to meet H.E. Ms. Isabelle Dumont, Ambassador of France to Turkey, who gave the opening speech of the event. Their conversation was a highlight, reflecting the importance of international collaboration in the tech sector.

The event was a great success, providing a platform to connect with other startups, industry leaders, and innovators. We thoroughly enjoyed the discussions and networking opportunities that arose throughout the day.

We would like to extend our heartfelt thanks to the La French Tech Istanbul team, including Ömer Hantal, Murat Peksavaş, and Eren Arasan, for their hard work and dedication in organizing this event. Their efforts made it a memorable and impactful experience.

This is some text inside of a div block.
Newsroom

Novus CEO Rıza Egehan Asad Speaks at "Overcoming Barriers: Fundraising Processes" Panel

Novus' CEO shares insights on fundraising at the "Overcoming Barriers: Fundraising Processes" panel, organized by Inveo Ventures.

June 6, 2024
Read more

Our CEO, Rıza Egehan Asad, had an engaging conversation yesterday at the "Overcoming Barriers: Fundraising Processes" panel. The panel was moderated by Anıl Yıldırım and featured insights from Murat Hacioglu, CEO & CRO of B2Metric, and Emre Öget, Partner & COO of Retter.io, upon the invitation of our investor, Inveo Ventures.

During his speech, Rıza Egehan Asad shared valuable insights into the investment processes at Novus, discussing the strategies we employ as a growing startup to secure funding and overcome financial challenges. He highlighted the importance of building strong relationships with investors, maintaining transparency, and continuously innovating to attract and retain investor interest.

The panel provided a platform for exchanging ideas and experiences on fundraising, offering attendees a deeper understanding of the intricacies involved in securing investment for startups. The discussions emphasized the significance of adaptability and resilience in navigating the fundraising landscape.

We would like to extend our sincere thanks to the Inveo Ventures team, especially Haluk Nişli and Onur Topaç, for organizing this excellent event and for their invitation.

This is some text inside of a div block.
Novus Voices

The Right Question Often Matters More Than the Answer

Have you ever wonder about the AI's impact on our social life and identity?

June 4, 2024
Read more

Are you familiar with the concept of entropy? It’s a concept in physics that suggest that there is some amount of disorder or randomness in every system, even in the universe. And the entropy of the universe must therefore increase over time, all stars eventually burn out, and the universe will face death, just like us.

The concept of entropy reminds me of change, as J. Cole articulates in his song "Middle Child": "Everything grows, it's destined to change." He's currently dealing with 6ix God and K-Dot, but let’s not get sidetracked.

Everything changes—culture, technology, society, even your ex. However, I believe that thanks to the revolution in AI & Robotics, along with fundamental sciences like chemistry, we are on the verge of a major change that humankind has never experienced before.

That's a bold statement.

I may not be an expert in many of the fields I've mentioned, but I studied sociology and started a successful AI startup, so I know a bit about these topics (and yes, I am biased, but I believe this to be true).

So here's the gist: drawing from the research discussed in a recent scholarly article titled The Blended Future of Automation and AI: Examining Some Long-Term Societal and Ethical Impact Features, the implementation of AI and robotics is poised to fundamentally alter our societal structures and ethical frameworks.

As the article points out, AI's ability to impact jobs, societal norms, and interpersonal interactions represents a form of social influence akin to the broad effects theorized postulated under Social Impact Theory (a theoretical framework that describes how individuals can be influenced by other people and by societal forces). Shocking, right? You don’t need a PhD to suspect that AI will affect your everyday life at some point, but you might need one to jump into a question that big.

Long story short, this theory examines how AI, as a new “social actor,” is not merely a tool but an agent that reshapes social norms and values.

Moreover, the ethical implications of such transformative technologies were elaborately discussed in the article. The need for an ethical AI deployment is emphasized to prevent potential social repercussions such as increased inequality or misuse of autonomous systems.

We always like to think that technology globally enriches us. We’ve eradicated hunger and given a good fight against once popular diseases. That’s the cool part. The article advocates for a cautious approach, ensuring that AI development is aligned with human values and societal well-being—something I advocate for as well.

Revisiting Social Impact Theory in the Age of AI

The classic Social Impact Theory initially described how individuals adjust their behaviors based on their social environment. Today, AI, acting as a 'social actor,' adds a new layer to Bibb Latané's theory.

For instance, AI-driven social media algorithms have the power to shape political opinions and social norms at a pace and magnitude that were once unthinkable. What criteria do these algorithms use to determine which content to promote? What are the long-term effects of these decisions? These questions are crucial as we explore the social terrain molded by AI.

As the founder of an AI startup, I too am concerned that we are advancing too quickly. We're not questioning enough; applied sciences are increasingly favored while fundamental principles are being neglected. We seldom stop to ask why we need to accelerate, yet we continue to do so regardless. This can lead to positive outcomes, but the potential for negative consequences is equally significant.

Why the rush?

Economic Shifts Driven by Automation

The integration of AI and robotics into various industries represents more than just a technological upgrade; it serves as a catalyst for profound economic transformation. Automation could lead to significant shifts in employment patterns, with certain jobs becoming gone and new roles emerging.

I have always believed this shift to be fundamentally beneficial for economies—and I still do. After all, we no longer ride horses; efficiency often prevails over other values. Efficiency is undeniably important, not just in human things but as a principle observed in evolution itself.

However, if we label efficiency as factor 'A', we must ask: Can efficiency alone solve all our problems, or do we need factors 'B', 'C', and even 'D' alongside it? At its extreme, efficiency can even be detrimental to society. We need a deeper understanding of human nature and society at large.

When we use terms like “development” and “progress,” we need to tread carefully. Comparing data from different eras can be tricky. It may seem like we're making progress. However, one should be concerned about the relationships between different social and economic classes and how they will be affected by AI.

I'm an optimist, but I'm not naive. We need to answer big questions, but first, we need to come up with those questions.

As the Greek philosopher Plato once mentioned, "The right question is usually more important than the right answer."
This is some text inside of a div block.
Newsroom

Novus Participates in TEB & Orta Doğu Teknik Üniversitesi Accelerator Program in Grenoble

Novus joins the TEB & ODTÜ Accelerator Program in Grenoble, presenting AI innovations and networking with industry leaders.

June 3, 2024
Read more

Novus had the incredible opportunity to participate in the TEB & Orta Doğu Teknik Üniversitesi / Middle East Technical University Accelerator Program. This exclusive program included only a select few startup co-founders, and we are honored to have been among them.

The program took us to the vibrant city of Grenoble, where our CRO, Vorga Can, had the unique chance to present Novus to a distinguished group of key individuals.

The week was packed with insightful panels, engaging presentations, and exciting events. It was a fantastic experience for Novus to connect with industry leaders and peers, share our vision, and learn from others in the startup community.

A heartfelt thank you to TEB for selecting us for this unique opportunity.

We are excited about the connections made and the future opportunities this program has created for Novus. Stay tuned as we continue to innovate and expand our horizons!

This is some text inside of a div block.
Newsletter

Novus Newsletter: AI Highlights - May 2024

May's AI milestones: OpenAI and Google updates, deepfake MET Gala, AI in dating. Novus’s key event participation and achievements.

May 31, 2024
Read more

Hey there!

Duru here from Novus, ready to share the exciting highlights from our May AI newsletters. This month, we've witnessed significant advancements in AI, hosted exciting events, and celebrated remarkable achievements within our team.

In each newsletter I find the most interesting AI news for you and of course keep you up to date with the latest insights and developments. Here, I have compiled the key stories and updates from May 2024 to keep you informed and engaged.

If you want to stay more up to date with what's happening on the AI field, you can subscribe to our bi-weekly newsletter. You will receive the latest updates and exclusive insights directly to your inbox.

Now, let's jump in!

AI NEWS

What is AI Anyway?

Mustafa Suleyman, founder of DeepMind and Inflection AI, shared his thoughts on AI ethics in a recent TED talk, describing AI as a “new digital species” and emphasizing the importance of understanding and controlling it.

  • Key Point: Suleyman highlights the need for ethical considerations in AI development and urges us to give AI the best of humanity.
  • Further Reading: Mustafa Suleyman's TED Talk

Drake ft. AI Tupac

The use of AI in music hit the headlines again with Drake’s song featuring AI-generated vocals of Tupac and Snoop Dogg, sparking legal and ethical debates.

  • Key Point: This incident underscores the ongoing controversy around AI's role in music and intellectual property rights.
  • Further Reading: AI in Music Controversy

New AI Widgets: Rabbit R1 and Humane AI Pin

Recently, two new AI assistants, Rabbit R1 and Humane AI Pin, were launched, offering limited functionalities compared to existing assistants like Siri and Alexa.

  • Key Point: While these AI assistants bring novelty, their limited features highlight the need for more practical and useful AI innovations.
  • Further Reading: Rabbit R1 Review
  • Further Reading: Humane AI Pin Review

Deepfake MET Gala

This year’s MET Gala saw deepfake images of celebrities like Rihanna and Katy Perry, blurring the lines between reality and digital fabrication.

  • Key Point: The incident raises concerns about the impact of deepfakes on public perception and the authenticity of digital content.
  • Further Reading: Deepfake MET Gala

Dating in the Future: AI Concierge

Bumble’s founder envisions AI concierges handling virtual dating, potentially revolutionizing the dating experience by minimizing direct social interaction.

  • Key Point: AI could reduce dating app fatigue and enhance match compatibility, making the dating process more enjoyable and less stressful.
  • Further Reading: AI in Dating

NOVUS UPDATES

Harvard & MIT Generative AI Map

We are proud to be included in the Harvard & MIT Generative AI Map, recognizing Novus as a key player in the AI ecosystem. This is a testament to our ongoing commitment to AI innovation and excellence.

Generative AI Map

Allianz Hackzone Scale Up Accelerator Program

We successfully completed the Allianz Hackzone Scale Up Accelerator Program, showcasing our enhanced AI features and evolving capabilities at Demo Day. This program has been instrumental in refining our AI solutions and expanding our industry network.

Workshop at Social Impact Awards’24

We hosted a workshop at SIA'24, where Vorga shared Novus’s journey and I guided participants in creating personas with ChatGPT. The feedback was overwhelmingly positive, and we look forward to more such engagements.

TEAM INSIGHTS

This month, we are particularly proud of the achievements of our team members whose work was featured at ICLR 2024, one of the most prestigious academic conferences in AI.

Büşra’s Weather Forecasting Model

Büşra with Prof. Dr. Gözde Unal and Prof. Alper Unal from Istanbul Technical University and Abdullah Akgül and Assoc. Prof. Melih Kandemir from Syddansk Universitet - University of Southern Denmark  developed an advanced deep learning model aimed at improving monthly and seasonal weather forecasts. This model leverages complex algorithms to enhance the accuracy and reliability of weather predictions, providing significant benefits for sectors such as agriculture, disaster management, and logistics. Their innovative approach is a major step forward in the field of meteorology and AI.

Taha’s Topoformers

Taha with Nicholas M. Blauch from Harvard University and Greta Tuckute from Massachusetts Institute of Technology  introduced 'Topoformers,' a novel method for adding topographic organization to transformer network layers. This groundbreaking research optimizes the efficiency and performance of transformer models, which are essential for various natural language processing tasks. By enhancing how these models learn and process information, their work has the potential to drive significant advancements in AI.

We are incredibly proud of Büşra and Taha for their contributions and the recognition they have received at ICLR 2024. Their hard work and dedication continue to push the boundaries of AI research.

Interested in the latest AI advancements and more insights?

Subscribe to our newsletter to receive regular updates, exclusive content, and a glimpse into our ongoing projects.

Join the Novus community and stay informed about how we are pushing the boundaries of AI innovation.

This is some text inside of a div block.
Newsroom

Novus CEO Shares Insights at İş Bankası AI Startup Factory's Anniversary Interview

Novus CEO Rıza Egehan Asad celebrated AI Startup Factory's first anniversary and highlighting Novus' achievements.

May 30, 2024
Read more

Novus is delighted to celebrate the first anniversary of the AI Startup Factory at İş Bankası. Our CEO, Rıza Egehan Asad, marked the occasion with an insightful interview, highlighting the remarkable achievements of Novus over the past year.

One of the evening's highlights was the opportunity to connect with fellow startups within the AI Startup Factory community, fostering new relationships and collaborations in a vibrant cocktail setting.

Five months ago, we also had the privilege of participating in the Kohort-4 event, part of Türkiye İş Bankası's innovative AI Startup Factory program, where we delivered a presentation. This experience was invaluable and enriching for our team.

We extend our sincere thanks to Türkiye İş Bankası and the AI Startup Factory team for cultivating such a dynamic and supportive environment.

No item found.

Would you like to clean your filters and try again?

Clear Filters
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Ready to see

in action?

Discover how our on-premise AI solutions can transform your business.