The Rise of NewSQL Databases: A Paradigm Shift in Data Management

The landscape of data management is undergoing a tremendous transformation. As organizations are dealing with the exponential growth and diversification of data, the limitations of traditional relational databases (RDBMS) and non-relational databases (NoSQL) become increasingly apparent. In this evolving landscape, NewSQL databases have emerged as a potential game changer, offering a compelling blend of scalability and consistency.

What are NewSQL Databases?

NewSQL databases represent a new breed of database technologies designed to bridge the gap between the scalability and flexibility of NoSQL databases and the consistency and ACID compliance of RDBMS. They achieve this by leveraging a range of architectural advancements, including:

  • Distributed Architecture: This enables horizontal scaling across multiple servers, allowing NewSQL databases to handle massive datasets and concurrent transactions with out any issue.
  • In-memory Storage: By storing data in RAM instead of on disk, NewSQL databases offer significantly faster read and write operations, enhancing overall performance.
  • ACID Compliance: NewSQL databases uphold the principles of Atomicity, Consistency, Isolation, and Durability (ACID), ensuring data integrity and reliability even in complex transactional scenarios.

Drivers for NewSQL Adoption

Several key factors are driving the rapid adoption of NewSQL databases:

  • The Data Explosion: The volume and variety of data generated in today’s digital era necessitates scalable solutions. Traditional databases often struggle with this explosive growth, making NewSQL an alternative.
  • Real-time Analytics: Businesses increasingly demand insights from their data in real-time to make informed decisions. The speed and agility of NewSQL databases make them ideal for supporting real-time analytics.
  • Data Integrity Demands: Industries like finance and healthcare require uncompromising data integrity. NewSQL databases fulfill this critical need by guaranteeing ACID compliance for accurate and reliable data handling.

Why Choose NewSQL?

Organizations looking to harness the power of NewSQL can expect numerous benefits:

  • Scalability: Seamlessly scale horizontally to accommodate growing data volumes and user bases without impacting performance.
  • Performance: Leverage in-memory storage and distributed architecture for exceptional performance, ideal for demanding workloads.
  • Consistency: Ensure data integrity and consistency even in complex transactions with ACID compliance.
  • Flexibility: Adapt to diverse data types and application requirements with flexible data models.
  • Cost-effectiveness: Open-source NewSQL options offer cost-effective solutions accessible to various organizations.

What are the challenges to consider while adopting NewSQL?

While NewSQL offers significant advantages, organizations should also be aware of some potential challenges:

  • Maturity: As a relatively new technology, NewSQL databases may lack the long-term track record and stability of established database solutions.
  • Complexity: Managing and maintaining a distributed, feature-rich NewSQL database can be more complex than traditional database solutions.
  • Limited Ecosystem: The ecosystem of tools and services supporting NewSQL is still under development, potentially impacting integration with existing infrastructure.

Examples of Leading NewSQL Databases:

  • Google Spanner
  • CockroachDB
  • TiDB
  • YugabyteDB
  • MariaDB

Here is a comparison of Google Spanner, CockroachDB, TiDB, YugabyteDB, and MariaDB capabilities:

FeatureGoogle SpannerCockroachDBTiDBYugabyteDBMariaDB
ScalabilityHorizontalHorizontalHorizontalHorizontalVertical
ConsistencyACIDACIDACIDACIDACID
Open SourceNoYesYesYesYes
Cloud-basedYesYesYesYesNo
In-memory StorageYesYesYesYesNo
SQL CompatibilityYesYesYesYesYes
Multi-region SupportYesYesYesYesNo
Real-time AnalyticsYesYesYesYesYes
CostHighMediumMediumMediumLow

NewSQL databases offer a paradigm shift in data management, presenting a compelling solution that bridges the gap between scalability and consistency. While still evolving, NewSQL has the potential to revolutionize various data-driven applications. By carefully evaluating their specific needs and requirements, organizations can leverage the power of NewSQL to gain a competitive edge in the digital era. Remember, the choice between NewSQL and other options depends on your unique data landscape.

Tagged with: , , , , , , , ,
Posted in New Sql

Apache Kafka vs Apache Pulsar – Which one to choose?

In today’s data-driven world, the ability to process and analyze real-time data streams is crucial for businesses. Two open-source platforms, Apache Kafka and Apache Pulsar, have emerged as leaders in this space. But which one is right for you?

Market Share and Community:

  • Apache Kafka: Commands a dominant 70% market share, boasting a vast user base and extensive ecosystem of tools and libraries. This makes it more appropriate for organizations to adopt it.
  • Apache Pulsar: Though holding a smaller 30% share, Pulsar is rapidly gaining traction, especially among cloud-native companies and those valuing its unique features.

Pros and Cons:

Apache Kafka:

Pros:

  • Mature and proven: With years of development and refinement, Apache Kafka offers stability and reliability.
  • Extensive ecosystem: A vast collection of connectors, libraries, and tools ensures seamless integration with various technologies.
  • High performance: Kafka scales effortlessly to handle massive data streams, making it ideal for demanding workloads.
  • Stream processing powerhouse: Kafka’s built-in stream processing capabilities simplify real-time data analytics.

Cons:

  • Complexity: Managing Kafka, with its ZooKeeper dependency, can be challenging for smaller teams. But, with KIP-833 we can run Kafka cluster without Zookeeper.
  • Resource-intensive: Running Kafka can be resource-intensive, requiring high-performance infrastructure.
  • Limited multi-tenancy: Kafka primarily focuses on single-tenant deployments, limiting its use in some scenarios.
  • Lacking of native support for Multi-DC cluster setup with geo replication. Though there is Kafka Mirrormaker and the stretched cluster concept, it has some performance impact.

Apache Pulsar:

Pros:

  • Cloud-native design: Apache Pulsar was built for the cloud, offering seamless integration with cloud platforms and micro-services architectures.
  • Multi-tenancy built-in: Pulsar allows for secure and efficient sharing of resources across multiple users and applications.
  • High scalability: Pulsar’s tiered storage architecture enables horizontal scaling to handle enormous data volumes.
  • Low latency: Pulsar excels at low-latency data processing, making it ideal for time-sensitive applications.
  • Native Multi-DC cluster setup with geo replication.

Cons:

  • Maturity gap: Compared to Kafka, Pulsar’s ecosystem is still under development, with fewer readily available tools and libraries.
  • Smaller community: While growing, Pulsar’s community is smaller than Kafka’s, potentially leading to limited support resources.
  • Stream processing capabilities: Though improving, Pulsar’s stream processing capabilities are not as mature as Kafka’s.

Use Cases:

Common to both:

  • Real-time analytics: Analyze data streams in real-time for immediate insights and decision-making.
  • Log aggregation: Collect and analyze log data from various sources for centralized monitoring and troubleshooting.
  • Microservices communication: Connect and communicate between microservices in a distributed system.
  • IoT data processing: Process and manage data streams generated by IoT devices for real-time monitoring and control.

Specific to Kafka:

  • Building message-driven applications: Leverage Kafka’s messaging capabilities to build highly scalable and distributed applications.
  • High-throughput data pipelines: Kafka excels at handling large volumes of data with minimal latency, making it ideal for data pipelines.

Specific to Pulsar:

  • Cloud-based deployments: Pulsar’s cloud-native design makes it perfect for deploying real-time data streaming applications in the cloud.
  • Multi-tenant environments: Pulsar’s multi-tenancy capabilities allow for secure and resource-efficient sharing of data pipelines across multiple organizations.

Production Deployers:

Apache Kafka:

  • Netflix: Processes billions of events daily for personalization, recommendations, and real-time analytics.
  • LinkedIn: Handles millions of messages per second for feed updates, notifications, and social graph management.
  • Uber: Uses Kafka to power its real-time tracking and matching systems for drivers and passengers.
  • More details https://kafka.apache.org/powered-by

Apache Pulsar:

  • Yahoo: Leverages Pulsar for its real-time advertising platform, managing billions of events per day.
  • Tencent: Utilizes Pulsar to handle trillions of messages daily for its messaging services and social media platforms.
  • BMW: Uses Pulsar for its connected car platform, processing real-time data from millions of vehicles.
  • More details https://pulsar.apache.org/case-studies/

Trends:

  • Hybrid deployments: Organizations are increasingly combining Kafka and Pulsar to benefit from each platform’s strengths.
  • Serverless integration: Both platforms are integrating with serverless functions for a more flexible and cost-effective approach.
  • Edge computing: Both Kafka and Pulsar are finding application in edge computing scenarios for decentralized data processing.
  • Multi-tenancy is key: Platforms with strong multi-tenancy features are becoming increasingly important, particularly in cloud-based environments.

Choosing between Apache Kafka and Apache Pulsar requires careful consideration of your specific needs and priorities. Try to do benchmarking for your use cases, test for availability and scalability before choosing any one for implementation. There are some benchmarking results available for initial glance from https://www.confluent.io/kafka-vs-pulsar/ . We will meet in next blog post, until then, Happy Messaging!!!

Tagged with: ,
Posted in Apache Kafka, Miscellaneous, Uncategorized

Managing Technical Debt: Tradeoff between Speed and Long-Term Sustainability

Technical Debt
Photo by DeepMind on Unsplash

Technical debt is a concept that has become increasingly important in software development over the past few decades. In essence, it refers to the trade-off between delivering software quickly and maintaining its long-term sustainability. Just as financial debt can accumulate over time and can become a burden, technical debt too if it is not managed effectively.

In this blog post, we will explore the concept of technical debt and provide some best practices for managing it. We will begin by defining technical debt and explaining why it is important to manage it effectively. We will then explore some common causes of technical debt and provide some strategies for reducing and avoiding it. Finally, we will discuss some best practices for balancing the need for speed with long-term sustainability in software development.

What is Technical Debt?

Technical debt refers to the cost of maintaining and supporting software that was built quickly and without regard for its long-term sustainability. Just as financial debt accumulates over time and accrues interest, technical debt accumulates as developers take shortcuts or make compromises that result in suboptimal design and code quality. That results that the code is difficult to maintain, modify, or extend. This can lead to a situation where the cost of maintaining and supporting the software is much higher than it would have been if the software had been built with sustainability in mind from the beginning.

What causes Technical Debt?

There are many factors that can contribute to the accumulation of technical debt. Here are a few common causes:

  1. Rush to Meet Deadlines

One of the most common causes of technical debt is the pressure to meet deadlines. If a development team is under pressure to deliver software quickly, they may take shortcuts or make compromises that result in suboptimal code quality. While this may allow them to meet the deadline, it can lead to a situation where the cost of maintaining and supporting the software is much higher than it would have been if the software had been built with sustainability in mind from the beginning.

  1. Lack of Planning

Another common cause of technical debt is a lack of planning. If a development team does not take the time to carefully plan out the architecture and design of their software, they may end up with design/code that is difficult to maintain or extend.

  1. Working in Silos

Poor communication between dev team members can also contribute to the accumulation of technical debt. If developers are not communicating effectively with each other, cross functional team members or with stakeholders, they may end up making design decisions that are not optimal for the long-term sustainability of the product.

  1. Lack of DevSecOps adoption and right SDLC tooling

Finally, a lack of DevSecOps adoption and not to incorporate right tooling to automate testing, code scanning, code reviews etc., can also contribute to the accumulation of technical debt. If developers do not have a robust code review process, automated testing suite in place, they may introduce defects that are not caught until much later in the development cycle. This can lead to a situation where the cost of fixing defects is much higher than it would have been if the defects had been caught earlier in the development process. This also degrades morale of the developers as they spend more hours to manage and fix the defects.

Why is Technical Debt Important to Manage?

Managing technical debt is important for a number of reasons. First and foremost, technical debt can be a major drain on productivity and resources. If developers are spending all their time fixing defects and maintaining legacy code, they will have less time to work on new features and improvements.

In addition, technical debt can lead to higher costs and longer development times. If code is poorly designed or difficult to modify, it may take much longer to add new features or make changes to the software. This can result in missed deadlines, higher costs, and a negative impact on the development team morale and job satisfaction. If they are constantly working with suboptimal code or struggling to fix defects that could have been avoided with better design decisions, they may become frustrated and demotivated. This can lead to higher turnover rates and a less productive team.

How to reduce/avoid Technical Debt?

Now that we have explored some common causes of technical debt, let’s discuss some strategies for reducing and avoiding it:

  1. Focus on Delivering Value

One of the most effective ways to reduce and avoid technical debt is to prioritize design of product and code quality from the beginning. This means taking the time to carefully plan out the architecture and design of your software, and ensuring that all code is thoroughly reviewed and tested before it is merged into the main source code branch. By prioritizing code quality, you can avoid many of the shortcuts and compromises that can lead to technical debt.

  1. Use Agile Methodologies

Agile methodologies, such as Scrum and Kanban, can also be effective at reducing and avoiding technical debt. By breaking down development into smaller, manageable sprints and focusing on delivering value in each sprint, you can ensure that your software is being developed with sustainability in mind from the beginning. Agile methodologies also prioritize communication and collaboration with in the team and across the different teams, which can help ensure that everyone is on the same page and working towards the same OKRs (Objectives and Key Results).

  1. Code Reviews and Automated Testing

Code reviews can help establish coding standards and best practices that can help prevent technical debt from accumulating in the first place. Automated testing is another effective way to reduce and avoid technical debt. By implementing a robust suite of automated tests, you can catch defects much earlier in the development process, before they have a chance to accumulate and become a burden. Automated testing can also help ensure that all code is thoroughly tested before it is merged into the main source code branch.

  1. Manage Technical Debt as part of Sprint Planning

Managing technical debt should be a part of your sprint planning process. As you prioritize features and improvements, you should also consider the impact they will have on your codebase and the potential for accumulating technical debt. This means weighing the benefits of delivering features quickly against the long-term sustainability of your software.

  1. Prioritize Technical Debt Reduction

While it may be tempting to prioritize new features and improvements over technical debt reduction, it is important to keep technical debt reduction a priority. By taking the time to reduce and avoid technical debt, you can ensure that your software is being developed with long-term sustainability in mind.

  1. Make Technical Debt Reduction a Team Effort

Reducing and avoiding technical debt should not be the sole responsibility of one team member. Instead, it should be a team effort, with everyone working together to ensure that technical debt is kept to a minimum. This means encouraging everyone on the team to prioritize code quality, communicate effectively, and stay on top of technical debt reduction.

  1. Continuously Monitor Technical Debt

Finally, it is important to continuously monitor the technical debt associated with your software. One way to accomplish this is by integrating tools like SonarQube and Kiuwan into your CI/CD pipeline, allowing you to continuously assess technical debt and communicate it to the stake holders. This means keeping track of technical debt metrics, such as code complexity and defect counts, and regularly assessing the impact of technical debt on your software’s sustainability. By staying on top of technical debt, you can ensure that your software remains sustainable over the long-term.

Ultimately, managing technical debt is about taking a proactive approach to software development. By prioritizing quality of design and development, communication, and technical debt reduction, you can ensure that your software is developed with both speed and sustainability in mind, and avoid the costs and risks associated with technical debt.

Tagged with: , , , ,
Posted in General

The Rise of Low-Code/No-Code Application Development

low-code/no-code
Photo by Ilya Pavlov on Unsplash

What is Low-code?

Low-code is a software development approach that allows developers to create and maintain applications with minimal hand-coding and minimal effort. It enables users to visually design, build, and deploy software applications using a graphical user interface rather than traditional coding. Low-code platforms typically provide pre-built, reusable components and drag-and-drop functionality, making it easy for non-technical users to create and customize applications without the need for programming skills. This approach can help organizations to build and deploy software faster, with less cost, and with more flexibility.

Low-code platforms typically provide pre-built, reusable components that users can drag and drop onto a visual canvas to create their application. These components may include things like forms, buttons, and data tables. Users can then customize the components by adjusting properties, such as layout and style, or by creating simple scripts or formulas to handle data and logic.

What is No-code?

No-code is a type of software development approach that allows users to create, customize, and maintain software applications without the need for writing any code. It is an even higher abstraction level than low-code, which allows non-technical users to create and deploy software applications by using a drag-and-drop interface or pre-built templates, similar to low-code. The idea behind no-code is to allow anyone, regardless of their technical background, to create software applications without the need for programming skills. The end goal is to enable faster and more efficient application development, while also reducing the need for specialized development resources.

No-code platforms, on the other hand, take this abstraction a step further, providing pre-built templates and workflows that can be used to create an application by configuring the pre-built components, without any coding required. These templates can be used to create a wide range of applications, from simple forms to more complex business process automation.

Low-code and no-code tools typically work by providing a visual, drag-and-drop interface for users to design, build, and deploy software applications. This interface allows users to create and customize software applications without the need for traditional coding. Both low-code and no-code tools also typically provide a way to connect to external data sources and services, allowing users to easily integrate their applications with other systems and tools.

What are the benefits of Low-code/No-code?

Low-code and no-code platforms offer a number of benefits for organizations looking to create and maintain software applications. Some of the main benefits include:

  1. Faster development: Low-code and no-code platforms allow developers to create and deploy software applications faster than traditional coding, by providing pre-built, reusable components and drag-and-drop functionality.
  2. Reduced costs: Low-code and no-code platforms can help organizations to reduce costs by allowing them to build and deploy software faster and with less need for specialized development resources.
  3. Increased productivity: Low-code and no-code platforms allow non-technical users, such as business analysts and domain experts, to create and customize software applications without the need for programming skills, increasing productivity and aligning IT with business needs.
  4. Greater flexibility: Low-code and no-code platforms can be used to create a wide range of software applications, from simple forms to more complex business process automation, allowing organizations to respond to changing business needs and stay competitive.
  5. Better data and business process management: Low-code and no-code platforms can be used to create and maintain databases and analytics platforms, allowing organizations to better manage and analyze their data, as well as automate business processes, such as data entry and workflow management.
  6. Cloud and IoT integration: Low-code and no-code platforms can be used to create cloud and IoT applications, allowing organizations to leverage the power of cloud computing and the Internet of Things to improve efficiency and drive innovation.
  7. Cross-platform compatibility: Many low-code and no-code platforms have built-in support for multi-platform deployment, allowing organizations to develop once and deploy on multiple platforms, such as web, mobile, and desktop.

What are the use cases of Low-code/No-code platforms?

Low-code and no-code platforms have seen growing adoption across various industries in recent years, as organizations look to improve efficiency and reduce costs while developing software applications.

  1. Healthcare: Low-code and no-code platforms are increasingly being used in healthcare to improve patient care and automate workflows. For example, hospitals and clinics can use these platforms to create and maintain electronic health records, appointment scheduling systems, and patient portals.
  2. Financial Services: Low-code and no-code platforms are being used in financial services to automate workflows and improve the customer experience. Banks and insurance companies can use these platforms to create mobile apps, customer portals, and automated loan origination systems.
  3. Retail: Low-code and no-code platforms are being used in retail to improve the customer experience and automate workflows. Retailers can use these platforms to create mobile apps, e-commerce platforms, and inventory management systems.
  4. Manufacturing: Low-code and no-code platforms are being used in manufacturing to automate workflows, improve efficiency, and reduce costs. Manufacturers can use these platforms to create and maintain manufacturing execution systems, quality control systems, and supply chain management systems.
  5. Government: Low-code and no-code platforms are being used in government to automate workflows, improve efficiency, and reduce costs. Governments can use these platforms to create and maintain systems for managing public services, such as property tax systems, driver’s license systems, and voter registration systems.
  6. IT Services: Low-code and no-code platforms are being used in IT services to improve the efficiency of IT operations. IT services companies can use these platforms to create automation scripts, custom portals, and service management systems.
  7. Supply chain: Low-code and no-code platforms can be used in the supply chain to improve efficiency, automate workflows, and gain real-time visibility into the supply chain operations. These platforms can be used to create and maintain inventory management systems, supply chain visibility systems, order management systems, supply chain automation systems, supplier management systems, and logistics management systems.

Low-code and no-code platforms have seen a significant increase in popularity in recent years, and there are a number of statistics that demonstrate this trend.

  1. According to a survey by Forrester Research, low-code development platforms are projected to be used for 65% of all application development by 2024.
  2. A report by Gartner suggests that by 2024, low-code application development will be responsible for more than 65% of application development activity.
  3. A study by IDC found that the worldwide low-code development platform market is expected to grow from $10.3 billion in 2019 to $45.8 billion by 2024, at a CAGR of 34.5% during the forecast period.
  4. According to a report by MarketsandMarkets, the no-code development platform market size is expected to grow from $3.8 billion in 2020 to $10.7 billion by 2025, at a CAGR of 22.9% during the forecast period.
  5. Another study by IDC suggests that the no-code development platform market is expected to reach $13.9 billion by 2023.

There are many low-code platforms available in the market, some examples include:

  1. Pega: A Low code platform also enables collaboration between business and IT teams, allowing them to work together to create and deploy applications that meet the needs of the business. The platform is designed to be flexible and scalable, allowing users to easily modify and update applications as business needs change. Pega provides features like, A visual drag-and-drop interface for building applications, Pre-built, reusable components and features, A library of pre-built connectors to popular systems and services, A built-in rules engine for making decisions and automating processes, A built-in case management system for handling customer inquiries and complaints.
  2. Salesforce Lightning: A low-code platform from Salesforce that allows developers to build custom applications and automate business processes on the Salesforce platform.
  3. OutSystems: A low-code platform that allows developers to create web and mobile applications, and automate business processes using drag-and-drop visual development.
  4. Mendix: A low-code platform that allows developers to create web and mobile applications using a visual development environment, and automate business processes using pre-built connectors to external systems.
  5. Appian: A low-code platform that allows developers to create web and mobile applications, automate business processes, and manage data using a visual development environment.
  6. Microsoft PowerApps: A low-code platform that allows developers to create web and mobile applications, automate business processes, and manage data using a visual development environment, and it’s integrated with Microsoft Power Platform.
  7. Zoho Creator: A low-code platform that allows developers to create custom applications, automate business processes, and manage data using a drag-and-drop interface.

There are many no-code platforms available in the market, some examples include:

  1. Bubble.io: A no-code platform that allows users to create web applications, automate workflows and manage data using a visual drag-and-drop interface.
  2. Webflow: A no-code platform that allows users to create and design responsive websites and web applications using a visual drag-and-drop interface.
  3. Adalo: A no-code platform that allows users to create and design mobile applications using a visual drag-and-drop interface.
  4. Wix: A no-code platform that allows users to create and design websites, web applications and e-commerce sites using a visual drag-and-drop interface.
  5. Airtable: A no-code platform that allows users to create custom databases and automate workflows using a visual drag-and-drop interface.
  6. Unqork: A no-code platform that allows users to create and design web applications, automate workflows and manage data using a visual drag-and-drop interface, specifically designed for enterprise use cases.

In summary, low-code and no-code platforms offer faster development, reduced costs, increased productivity, greater flexibility, better data and business process management, cloud and IoT integration and cross-platform compatibility. These benefits make it a popular choice for organizations looking to create and maintain software applications quickly, easily, and with minimal coding.

Tagged with: ,
Posted in Other

Will ChatGPT replace Google Search?

ChatGPT
Photo by DeepMind on Unsplash

In this article, let’s explore ChatGPT and its use cases and see ChatGPT replaces Google Search or not?

ChatGPT (short for “Conversational Generative Pre-training Transformer”) is a large language model developed by OpenAI. It is a variant of the GPT (Generative Pre-training Transformer) model, which is trained on a massive amount of text data to generate human-like text.

Unsupervised learning is a technique in which the model is not provided with any labeled or annotated data. Instead, the model is trained on a large dataset of text data and is able to learn patterns and relationships in the data on its own. This approach is used to train ChatGPT, allowing it to generate human-like text.You can preview ChatGPT https://chat.openai.com/chat by entering some questions. By interacting you will feel you are doing conversation with your friend.

Chatbots and virtual assistants have come a long way in recent years. One of the most advanced of these technologies is ChatGPT, a language model developed by OpenAI. But, can ChatGPT replace Google search? In this blog post, we’ll explore the capabilities of ChatGPT and compare them to those of Google search.

First, let’s take a look at ChatGPT. This language model is trained on a massive amount of text data and can generate human-like text. It can answer questions, write essays, and even generate code. ChatGPT has been used to create chatbots and virtual assistants that can answer questions and perform various tasks.

Now, let’s compare ChatGPT to Google search. Google search is a powerful tool that can find information on almost any topic. It can search the web, images, videos, news, and more. Google search also uses machine learning to provide personalized results and has a wide range of advanced features like voice search, autocomplete, and the Knowledge Graph.

While ChatGPT can answer questions and generate text, it’s not designed to search the web. Google search, on the other hand, is specifically designed to search the web and has a vast array of features to help users find the information they need.

Additionally, ChatGPT is best suited for generating human-like text, but it can’t match the speed, efficiency and accuracy of a search engine that has been fine-tuned over the years and uses advanced algorithms to find and rank the most relevant information.

While Google Search is a powerful tool for finding information on the web, there are certain situations where ChatGPT may be superior. Here are a few examples:

  1. Natural Language Processing: ChatGPT is trained on a massive amount of text data and can understand and respond to natural language queries. This makes it well-suited for answering questions and having conversations in a more human-like manner. In contrast, Google Search may not always provide the most accurate or helpful results when dealing with more complex or nuanced queries.
  2. Writing and Text Generation: ChatGPT can generate human-like text. It can be used to write essays, articles, emails, and even generate code. Google Search is not designed for this purpose and may not provide the same level of text generation capabilities.
  3. Privacy and Security: ChatGPT can be used to answer questions and perform tasks without sending data to a third-party server. This can be beneficial in situations where privacy and security are a concern, such as when dealing with sensitive information.
  4. Personalization: ChatGPT can be trained on specific data and fine-tuned to a particular use case, which allows it to provide highly personalized results and responses. Google search, while it can personalize results, it may not be as specific and accurate as ChatGPT in certain scenarios.
  5. Complex and specific use cases: ChatGPT can be used to answer highly specific and complex queries that might not be easily searchable through a search engine. For example, it could be used to answer technical questions in a specific industry or generate reports on specific topics, something that Google search may not be able to do as easily.

While ChatGPT is a powerful language model, there are certain situations where Google Search may be superior. Here are a few examples:

  1. Searching the web: Google Search is specifically designed for searching the web and has a vast array of features to help users find the information they need. It can search for text, images, videos, news, and more, while ChatGPT is not designed for this purpose.
  2. Speed and Efficiency: Google Search has been fine-tuned over the years and uses advanced algorithms to find and rank the most relevant information. It can provide results in a matter of seconds, while ChatGPT may take longer to process a query, especially if it requires more complex natural language understanding.
  3. Relevancy and accuracy: Google Search can provide highly relevant and accurate results by using algorithms that take into account factors such as page relevance, user location and search history, among others.
  4. Multilingual support: Google Search supports a wide range of languages and can provide results in multiple languages, while ChatGPT is primarily designed to work with English.
  5. Advanced features: Google Search provides a wide range of advanced features such as voice search, autocomplete, and the Knowledge Graph, which can be useful in certain situations. ChatGPT does not have these features and is primarily focused on generating text.
  6. Large scale search: Google Search can handle large scale search queries, whereas ChatGPT might struggle in terms of handling and providing the results for large scale queries.

It’s worth noting that Google Search and ChatGPT are different tools with different strengths and weaknesses, and both can be useful depending on the task at hand.

In conclusion, ChatGPT is a powerful language model that can generate human-like text and answer questions. However, it is not intended to replace Google search, which is specifically designed for searching the web and has a wide range of advanced features. While ChatGPT can be a useful tool for certain tasks, it can’t match the speed and efficiency of a search engine like Google.

Tagged with: ,
Posted in Artificial Intelligence

Red Hat Summit 2021

Tagged with: ,
Posted in Events

Reactive Summit 2021

Tagged with: ,
Posted in Events

Audit Database Changes with Debezium

Debezium

In this article, we will explore Debezium to capture data changes. Debezium is a distributed open-source platform for change data capture. Point the Debezium connector to the database and start listening to the change data events like inserts/updates/deletes right from the database transaction logs that other applications commit to your database.

Debezium is a collection of source connectors of Apache Kafka Connect. Debezium’s log-based Change Data Capture (CDC) allows ingesting the changes directly from the database’s transaction logs. Unlike other approaches, such as polling or dual writes, the log-based approach brings the below features.

  • Ensures that all data changes are captured. The data changes may come from multiple applications, SQL editors, etc. Debezium captures every change event.
  • Produces change events with a very low delay while avoiding increased CPU usage required for frequent polling.
  • As the changes are captured at the database transaction log level, no changes required to your data model, such as a “Last Updated” column.
  • It captures deletes.

Let us discuss a use case to audit the database table changes for compliance purposes. There are different approaches to audit the databases.

  1. Using database triggers to monitor the DDL/DML changes. But, database triggers come with pain if you don’t use them wisely and hence lot of enterprise applications avoid them.
  2. Envers. The Envers module aims to provide an easy auditing/versioning solution for entity classes. It does a good job but, below are the issues we have.
    1. The audit logging is synchronous.
    2. The audit logging and the actual database changes for business logic need to be wrapped with the same transaction. If the audit logging fails, the whole transaction needs to be rolled back.
    3. If we decide to push the changes to another database instance, we might end up using distributed transactions. This will add performance overhead to the application.
    4. If we need to push the changes to other systems like analytics, search, etc. will be problematic.
    5. Mixing audit logging with the actual business logic creates a codebase maintenance issue.
    6. Not able to capture the changes coming from other applications/SQL shell.

3. Writing our own audit framework to capture the data changes. This works but, has the same issues highlighted on #2 above.

Now, let us see how Debezium solves the use case of database audit. The below design depicts the components involved to audit the DB with Debezium.

Follow the below steps to setup the Debezium connector.

Step1: Download the connectors from https://debezium.io/releases/1.4/#installation . In this example I am using MySql. Hence, I downloaded Debezium MySql connector. Debezium has connectors for variety of databases.

Step2: Install Kafka cluster. I used a simple Kafka cluster with one Zookeeper and one broker. Under the same Kafka installation, you will find Kafka connect related properties. Set the Debezium related jar files into the Kafka connect classpath by updating the plugin.path under connect-distributed.properties file.

Step3: Enable the bin log for MySql database.

Step4: Launch the Kafka cluster and the Kafka connect by launching the below commands.

#To start the Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
#To start the Kafka broker
/bin/kafka-server-start.sh /config/server.properties
#To start the Kafka connect
/bin/connected-distributed.sh /config/connected-distributed.properties

Step5: Add the MySql source connector configuration to the Kafka connect.

curl -k -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d '{
"name": "mysql-connector-demo",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "localhost",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "1",
"database.server.name": "dbserver1",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "customers_audit",
"table.include.list": "inventory.customers",
"transforms": "Reroute",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.topic.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.Reroute.topic.replacement": "$3"
}
}'

The details of the configuration is explained below.

Step6: Now, run some inserts/updates/deletes on the table which we configured to audit to see the events on the topic.

Below are some of the events we received on the topic for insert/update/delete DML. The actual JSON will have other properties. But, I am showing the trimmed version for simplicity.

"payload": {
"before": null,
"after": {
"id": 1016,
"first_name": "Smart",
"last_name": "Techie",
"email": "smarttechie@gmail.com"
},
"source": {
"version": "1.4.2.Final",
"connector": "mysql",
"name": "dbserver1",
"ts_ms": 1615928467000,
"snapshot": "false",
"db": "inventory",
"table": "customers",
"server_id": 223344,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 4015,
"row": 0,
"thread": 36,
"query": null
},
"op": "c",
"ts_ms": 1615928467236,
"transaction": null
}
"payload": {
"before": {
"id": 1016,
"first_name": "Smart",
"last_name": "Techie",
"email": "smarttechie@gmail.com"
},
"after": {
"id": 1016,
"first_name": "Smart",
"last_name": "Techie",
"email": "smarttechie_updated@gmail.com"
},
"source": {
"version": "1.4.2.Final",
"connector": "mysql",
"name": "dbserver1",
"ts_ms": 1615928667000,
"snapshot": "false",
"db": "inventory",
"table": "customers",
"server_id": 223344,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 4331,
"row": 0,
"thread": 36,
"query": null
},
"op": "u",
"ts_ms": 1615928667845,
"transaction": null
}
"payload": {
"before": {
"id": 1016,
"first_name": "Smart",
"last_name": "Techie",
"email": "smarttechie_updated@gmail.com"
},
"after": null,
"source": {
"version": "1.4.2.Final",
"connector": "mysql",
"name": "dbserver1",
"ts_ms": 1615928994000,
"snapshot": "false",
"db": "inventory",
"table": "customers",
"server_id": 223344,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 4696,
"row": 0,
"thread": 36,
"query": null
},
"op": "d",
"ts_ms": 1615928994823,
"transaction": null
}

You can find list of clients who uses Debezium here. I hope you enjoyed this article. We will meet in another blog post. Till then, Happy Learning!!

Tagged with: , , ,
Posted in Apache Kafka

Let’s think Kafka cluster without Zookeeper with KIP-500

Right now, Apache Kafka® utilizes Apache ZooKeeper™ to store its metadata. Information such as the partitions, configuration of topics, access control lists, etc. metadata stored in a ZooKeeper cluster. Managing a Zookeeper cluster creates an additional burden on the infrastructure and the admins. With KIP-500, we are going to see a Kafka cluster without the Zookeeper cluster where the metadata management will be done with Kafka itself.

Before KIP-500, our Kafka setup looks like depicted below. Here we have a 3 node Zookeeper cluster and a 4 node Kafka cluster. This setup is a minimum for sustaining 1 Kafka broker failure. The orange Kafka node is a controller node.

Let us see what issues we have with the above setup with the involvement of Zookeeper.

  • Making the Zookeeper cluster highly available is an issue as without the Zookeeper cluster the Kafka cluster is DEAD.
  • Availability of the Kafka cluster if the controller dies. Electing another Kafka broker as a controller requires pulling the metadata from the Zookeeper which leads to the Kafka cluster unavailability. If the number of topics and the partitions is more per topic, the failover Kafka controller time increases.
  • Kafka supports intra-cluster replication to support higher availability and durability. There should be multiple replicas of a partition, each stored in a different broker. One of the replicas is designated as a leader and the rest of the replicas are followers. If a broker fails, partitions on that broker with a leader temporarily become inaccessible. To continue serving the client requests, Kafka will automatically transfer the leader of those inaccessible partitions to some other replicas. This process is done by the Kafka broker who is acting as a controller. The controller broker should get metadata from the Zookeeper for each of the affected partition. The communication between the controller broker and the Zookeeper happens in a serial manner which leads to unavailability of the partition if the leader broker dies.
  • When we delete or create a topic, the Kafka cluster needs to talk to Zookeeper to get the updated list of topics. To see the impact of topic deletion or creation with the Kafka cluster will take time.
  • The major issue we see is the SCALABILITY issue.

Let’s see how the Kafka cluster looks like post KIP-500. Below is the Kafka cluster setup.

If you look at the post KIP-500, the metadata is stored in the Kafka cluster itself. Consider that cluster as a controller cluster. The controller marked in orange color is an active controller and the other nodes are standby controllers. All the brokers in the cluster will be in sync. So, during the failure of the active controller node, electing the standby node as a controller is very quick as it doesn’t require syncing the metadata. The brokers in the Kafka cluster will periodically pull the metadata from the controller. This design means that when a new controller is elected, we never need to go through a lengthy metadata loading process.

Post KIP-500 will speed up the topic creation and deletion. Currently, the topic creation or deletion requires to get the full list of topics in the cluster from the Zookeeper metadata. Post KIP-500, just the entry needs to add to the metadata partition. This speeds up the topic creation and deletion. Post KIP-500, the metadata scalability increases which eventually improves the SCALABILITY of Kafka.

In the future, I want to see the elimination of the second Kafka cluster for controllers and eventually, we should be able to manage the metadata within the actual Kafka cluster. That reduces the burden on the infrastructure and the administrator’s job to the next level. We will meet with another topic. Until then, Happy Messaging!!

Tagged with: ,
Posted in Apache Kafka

Building a 12-factor principle application with AWS and Microsoft Azure

                                                                                        Photo by Krisztian Tabori on Unsplash

                        In this article, I want to provide the services available to build the 12-factor applications on AWS and Microsoft Azure.

12-Factor Principles Amazon Web Services Microsoft Azure
Codebase
One codebase tracked in revision control, many deploys
AWS CodeCommit Azure Repos
Dependencies
Explicitly declare and isolate dependencies
AWS S3 Azure Artifacts
Config
Store config in the environment
AWS AppConfig App Configuration
Backing services
Treat backing services as attached resources
Amazon RDS, DynamoDB, S3, EFS and RedShift, messaging/queueing system (SNS/SQS, Kinesis), SMTP services (SES) and caching systems (Elasticache) Azure Cosmos DB, SQL databases, Storage accounts, messaging/queueing system(Service Bus/Event Hubs), SMTP services, and caching systems (Azure Cache for Redis)
Build, release, run
Strictly separate build and run stages
AWS CodeBuild
AWS CodePipeline
Azure Pipelines
Processes
Execute the app as one or more stateless processes
Amazon ECS services
Amazon Elastic Kubernetes Service
Container services
Azure Kubernetes Service (AKS)
Port binding
Export services via port binding
Amazon ECS services
Amazon Elastic Kubernetes Service
Container services
Azure Kubernetes Service (AKS)
Concurrency
Scale-out via the process model
Amazon ECS services
Amazon Elastic Kubernetes Service
Application Auto Scaling
Container services
Azure Kubernetes Service (AKS)
Disposability
Maximize robustness with fast startup and graceful shutdown
Amazon ECS services
Amazon Elastic Kubernetes Service
Application Auto Scaling
Container services
Azure Kubernetes Service (AKS)
Dev/prod parity
Keep development, staging, and production as similar as possible
AWS Cloud​Formation Azure Resource Manager
Logs
Treat logs as event streams
Amazon CloudWatch
AWS CloudTrail
Azure Monitor
Admin processes
Run admin/management tasks as one-off processes
Amazon Simple Workflow Service (SWF) Logic Apps
Tagged with: , ,
Posted in Cloud Computing
Dzone.com
DZone

DZone MVB

Java Code Geeks
Java Code Geeks
OpenSourceForYou