What happens when there is a business use case where you have to work on creating models and the data required is sitting at different places like in onpremise systems (Hadoop , DB2 or EDW) and some of the data is in cloud (Azure, AWS or GCP ) The data required to create models is present in chunks at different places. How will you work on these data since the data is not present in single source ? One solution to this is : You have to bring the data into one place so that you can work on the data. How to bring the chunks of data from different sources to a single place since all of the data are present in different place ? So what we can do is migrate the required data from the onpremise systems to cloud and then perform data manipulations on the whole chunk of data using platforms like databricks or snowflake. But there comes a fallout to this solution or approach. Since we are migrating the data it comes with a cost to the business. There will be ingress and egress cost attached to data that is getting moved from one place to other. Also if the data size is huge it will add to huge cost to business. There will also be data duplicacy since the data will be present at both the places. So can we look for any other approach to this ? Is there any specific approach from where we can reduce data duplicacy? or even moving the data from onpremise to cloud can we work on this use case ? Yes here comes denodo in the picture. Denodo is a data virtualization tool where we can create connectivity between various data sources and perform our manipulations in the denodo itself. There is no need to move your data and no data duplicacy will be happening. Denodo serves as a platform where you can do manipulations in the data, create models and join data present in different sources ( be it cloud or be it onpremise ). There is no requirement to move the chunk of data to denodo. You just have to create a connectivity and you are good to go. #data #usecase #analytics #Denodo #costoptimization #datavirtualization
Shyanti Bhattacharjee’s Post
More Relevant Posts
-
Check this out on LinkedIn Learning! I have made it free for you with the link below. #dataengineering #apachespark
This content isn’t available here
Access this content and more in the LinkedIn app
To view or add a comment, sign in
-
The only way by which you can grow in your career is ....... By helping others...seeking help from juniors or seniors ...or rather the opposite thing helping a senior or junior is the best way to grow and evolve in a career. I have seen it many times when juniors get hesitant to ask for help..but if you think you are struck at some task and you need help or guidance and unable to resolve it even if it was supposed to be solely done by you...go ask for help...go ask for the guidance...you will definitely find some or other generous person around you in the team who will help you with their guidance... technical expertise and thoughts.... I have been fortunate enough to have got help whenever required in my career...I seek help from my juniors seniors everyone...There were times when I was struck and had no clarity...upon seeking help..I got the guidance and support. So one basic rule that I follow is whenever someone reaches out to me for help...I try my best to help them...I know what it feels like to be struck.. Don't hesitate to ask some might not help you..some might not even bother to guide you...but still there are people who will go out of their way to help you..guide you..support you... The only way by which you can grow is by supporting each other ...not by pulling someone down. #growth #guidance #career #help #support #analytics #data #collaboration #evolve #clarity #need #basics #efforts #job
To view or add a comment, sign in
-
You will never know your actual or real potential until you try... It is often observed that we often create barriers in our mind. We are our own problem creators.. I still have this habit of giving up on things while if I try a bit I know I can complete or accomplish the work. On my way to my first switch...I almost took a break of 3 4 months and gave up completely on looking for new opportunities.. frustrated with my initial failures..setting my mind into thoughts like I am not good enough..I am not capable enough...to what not. But what I missed was all my initial failures paved my way to newer opportunities but I was not willing to try..I was not willing to accept another rejection. After a good 3 4 months of break from job hunting I came back with a mindset to give my best and not worry about the outcome...as easy as it can be said..but in reality it's tough..too tough because we don't want to face failures again... Putting it here so that whoever reads this and is feeling like giving up..give it one more try..give it one more chance.. You never know ...you may be getting the best opportunity for your career waiting for you. #job #switch #career #potential #failure #data #analytics #potential #success #opportunity #thoughts #perspective #idea #accomplishments
To view or add a comment, sign in
-
The only way by which you can risk your career is ........ Not taking enough risk... I was part of data science team when I joined my first organization. From there due to resource crunch and team rebuilding I was moved to data engineering within 6 months of my data science project...to be frank I was scared I was hell scared.... but there was only one thing that kept me going that was my attitude towards new things... that was let's see what happens... In months there came a new project and I was forced to move from data engineering to data analytics... I was still scared.. I was not at my comfort.. I thought I won't be able to take this...I failed...I failed miserably...but what kept me going was ... let's see what happens...what is there in the table for me... The biggest risk that you take in your career is not taking any risk.. Please understand whenever life gives you uncomfortable situation to deal with ..... take it as a challenge...learn from it....learn from the situation and build yourself for new challenges.. You can never grow if you are in your comfort zone...take risk..take your chances..you never know what is there for you in your next chapter... #failure #risk #reward #learning #job #career #opportunity #chances #choice #challenge #mistake #data #analytics #thoughts #perspective #growth
To view or add a comment, sign in
-
Bad feedback is always better than fake compliment... While I took the first dashboarding project on my own. I was very naive in dashboard and story telling and the dashboard that I was going to build was for the finance team and was of utmost importance to the business at that time...I started making the dashboard and once I completed I went to my manager and showed her..she literally was not happy with the colour combinations... according to her it was too flashy and colourful. The numbers and data points were all in sync and was correct but the colour and outlook of that dashboard was a disaster... at first I did feel agitated that I didn't got any compliment trying out everything on my own and finishing it rather what I got was a bad feedback... straightaway I went back ...asked some of my colleagues who where good at dashboarding to help me out and make me understand how to make it business ready and simple... After office I spent my whole night over this dashboard and made changes that were suggested..the next day the first thing I did was published the dashboard with all the changes and showed this to my boss. Moral : Taking feedbacks positively is a better approach than just asking for a fake compliment. Implementation : Whenever I take up some work and complete it I do point out and ask my seniors where could I improve...where did I mess things up..how can I do things in better ways. You are not an expert...you are in the process of excellence and perfection... If you think you are perfect... it's a myth...no one is perfect here...we all are learning...failing and upgrading. #feedback #analytics #data #job #thoughts #perspective #work #upgrade #upskill #powerbi #dashboarding #storytelling #opportunity #failure #upgrade
To view or add a comment, sign in
-
The smartest thing that you can do in an interview and land a job is .... Be true to your interviewer. I appeared for an interview back in Dec 2022 where this lady (interviewer) blocked a session of 1 hour and bombarded me with questions with SQL, analytics (guestimates questions), python, puzzles, databricks and I miserably failed in all of them. Being ashamed of my performance I wanted to end this interview as early as possible but this lady kept on giving me chances to perform well. She asked me what are your primary skills..at that time I have just started learning power bi so thought I will say I am good at power bi. She asked me different situation based power bi questions and since I didn't had any realtime project experience I failed completely in that too... I thought she will ask basic dashboarding questions and I will be able to solve them and get the job..but to my suprise when I check her linkedin she herself was an experienced power bi developer for more than 5+ years. Result : I got rejected. I asked for a feedback and she told me I have you chances to succeed but you are underprepared at all the skills. Moral : Knowledge that you have from doing courses and realtime projects are not at all comparable. Also learning a tool and working on that tool is very different. Be true to the interviewer... don't act oversmart. Implementation : Later on I asked for a proper dashboarding project from my manager and developed it end to end with the help of other fellow teammates and learnt from it. #interview #rejection #analytics #data #dashboarding #failure #switch #job #skill #knowledge #mistakes #jobsearch #hunting #strategy #thoughts
To view or add a comment, sign in
-
How can we save cost while storing data im cloud ? For optimising cost...you need to know the data usage and different tiers that can be used in azure to save a huge cost of the business. As businesses increasingly rely on the cloud for data storage, optimizing costs has become a top priority. One effective strategy is leveraging tiered storage, which offers varying levels of accessibility and cost. Here's how you can optimize costs across different storage tiers: 1. Understand Your Data : Analyze your data to determine how frequently it's accessed. Data that's frequently accessed should be stored in a tier that offers quick retrieval, even if it comes at a higher cost. 2. Utilize Hot Storage for Active Data : Hot storage, also known as "premium" or "hot" tier, is ideal for frequently accessed data that requires low latency. Use this tier for critical applications and real-time analytics where immediate access is necessary. 3. Leverage Warm Storage for Infrequently Accessed Data : Warm storage provides a balance between accessibility and cost. It's suitable for data that's accessed occasionally but still requires relatively fast retrieval. Archive data, backups, and historical records can be stored in this tier. 4. Opt for Cold Storage for Long-Term Archival : Cold storage, often the least expensive option, is designed for data that's accessed infrequently and can tolerate longer retrieval times. Use this tier for backups, regulatory compliance data, and archives that don't require immediate access. 5. Implement Data Lifecycle Policies : Define policies that automatically move data between tiers based on access patterns and retention requirements. This ensures that data is stored in the most cost-effective tier at any given time. Such as moving the data from one tier to a different tier when the data is not actively being used to reduce cost of storage 6. Regularly Review and Adjust Storage Strategy : As your data usage patterns evolve, regularly review your storage strategy and adjust it accordingly. This ensures you're always optimizing costs based on current needs. By understanding your data access patterns and strategically leveraging tiered storage options, you can optimize costs while ensuring that your data remains accessible and secure in the cloud. #cost #optimization #data #analytics #business #tiers #strategy #security #storage #analysis #azure #cloud
To view or add a comment, sign in
-
Clusters that we attach to our databricks notebooks while querying or running out codes adds cost to the business. Managing clusters effectively can not only reduce cost but also help in efficient use of clusters...as a data professional our goal should not just be doing our work but also includes how to reduce cost to the business and optimize the use of tools . By implementing these strategies, you can effectively manage Databricks clusters to minimize costs while ensuring optimal performance for your business use cases. 1. Right-sizing Clusters: - Start with smaller instance types and scale up only if necessary. - Utilize Databricks' Autoscaling feature to automatically adjust the number of workers based on workload demand. 2. Optimizing Cluster Utilization: - Schedule cluster termination during off-peak hours when the workload is low. - Use Databricks Job Scheduler or third-party schedulers to start and stop clusters on a schedule. - Leverage Databricks' Multi-User Cluster (MUC) feature to allow multiple users to share a cluster, maximizing resource utilization. 3. Monitoring and Performance Tuning: - Monitor cluster performance using Databricks Metrics and Spark UI to identify bottlenecks and optimize queries. - Tune Spark configurations, such as executor memory and cores, for better performance and resource utilization. - Utilize Databricks' Instance Pools to reuse idle clusters, reducing startup time and cost. 4. Cost Control Policies: - Set up cost controls and budget alerts in Databricks to track and manage spending. - Utilize Databricks' Instance Policy feature to enforce cluster termination after a specified idle period. - Implement dynamic allocation and deallocation of clusters using Databricks REST API or CLI based on workload demand. 5. Data Partitioning and Caching: - Optimize data storage and access patterns by partitioning large datasets. - Cache frequently accessed data to reduce the need for recomputation, improving query performance and reducing cluster usage. 6. Use Spot Instances and Preemptible VMs: - Take advantage of Databricks Spot Pricing (AWS) or Preemptible VMs (Azure) for cost-effective compute resources. - Design fault-tolerant workflows that can handle interruptions and gracefully recover from node failures. 7. Cost Analysis and Optimization: - Regularly analyze Databricks billing reports to identify areas for cost optimization. - Utilize Databricks Cost Estimator tool to estimate costs based on cluster configurations and usage patterns. #cost #optimization #databricks #azure #spark #cluster #clustermanagement #data #analytics #business #impacts #datadrivendecisions #cloud
To view or add a comment, sign in
-
It's better to put your focus on all the things that you can control.. earlier I use to focus and panic about the things that I can't control..There is no point in spending time in thinking and unnecessary worrying on the things that you can't control instead focus on the things that you can control like : 1. Focus on upskill rather than worrying why no job calls 2. Focus on applying to more jobs rather than worrying how many times you rejected in past. 3. Focus on referrals and networking rather than how many times you didn't get any revert back 4. Focus on the upcoming job opportunity rather than the lost one. Because you might apply to 100 jobs out of which only 25 may revert back and out of which you only need 1 perfect interview to get that job... So only focus on the things that you can control. #job #opportunity #switch #data #analytics #career #growth #transition #luck #failure #hardwork #chance #focus
To view or add a comment, sign in
-
Dashboarding and visualization is an important part of analytics. While preparing for interviews be prepared for facing power bi questions. Some of the power bi questions that I faced in my initial phases of job interviews are listed below : 1. Can you explain the difference between calculated columns and measures in Power BI? 2. How do you handle large datasets in Power BI to optimize performance? 3. Can you describe how to create a hierarchical structure in Power BI? 4. What is the role of DAX (Data Analysis Expressions) in Power BI, and can you provide some examples of DAX functions you frequently use? 5. How do you handle data modeling and relationships in Power BI? 6. Can you explain the process of data transformation in Power BI using Power Query Editor? 7. Have you worked with Power BI Embedded? If so, can you explain its usage and implementation? 8. What are some best practices for creating visually appealing and effective reports and dashboards in Power BI? 9. How do you schedule data refreshes in Power BI Service, and what are the considerations for maintaining data accuracy and freshness? 10. Can you discuss a challenging project you worked on using Power BI, and how you overcame any obstacles? 11. What is conditional formatting? How can you use it to enhance visualization? Give an example Preparing answers to these questions based on your experience and knowledge will help you feel confident and ready for your Power BI developer interview! Also a Tip : Go through each and every graph and pictorial representations that power bi provides and understand where to use what and why ... it will help you in better understanding. #powerbi #interview #analytics #data #switch #job #jobhunt
To view or add a comment, sign in
Sr. Data Engineer @ Daimler Truck | Jadavpur University
2moYou can take a look at Microsoft Fabric as well. It's much more feature-rich and takes away the pain of integration altogether.