- We cannot use SQL override while using flat files as source.
- We need to specify correct path in the session and have to mention the file is ‘direct’ or ‘indirect’.
- And keep the file in the exact path as mentioned in the session.
- Have to convert the date field to date data type since it reads as string.
- Informatica primary uses load manager and data transformation manager (DTM) to perform extracting, transformation and loading. Load manager reads parameters and variables related to session, mapping and server and passes the mapping parameters and variable information to the DTM . DTM uses this information to perform the data movement from source to target.
Using more than one session by linking it is called as batches. It is of two types-
- Sequential batches (session runs one after another).
- Concurrent batches (session runs at same time).
- Can’t compare lookup and joiner since each has its own functions. Sometimes joiner gives more performance and sometimes lookups.
- In case of Flat file, generally, sorted joiner is more effective than lookup, because sorted joiner uses join conditions and caches less rows. Lookup caches always whole file. If the file is not sorted, it can be comparable.
- In case of database, lookup can be effective if he database can return sorted data fast and the amount of data is small, because lookup can create whole cache in memory. If database responses slowly or big amount of data are processed, lookup cache initialization can be really slow. Then it can be better use sorted joiner, which throws data to output as reads them on input.
Totally there are 10 Sessions-
- Session: For mapping execution.
- Email: To send Emails.
- Command: To execute OS commands
- Control: Fail, Stop, Abort
- Event wait: For Pre_Defined or Post_Defined Events
- Event raise: To raise active User_Defined Event
- Decission: Condition to be evaluated for controlling flow or process
- Timer: To halt the process for specific time
- Work-let Task: Reusable Task
- Assignment: To assign values work-let or work flow variables
Pipeline partitioning a mapping in Informatica means to partition the pipeline inside the mapping.
A pipeline will contain the source, transformation and targets. Mapping will be subdivided into many pipelines are called stage pipeline.
After enabling partitioning in stage pipeline when we run the integration service it runs faster and we can expect more performance. The integration service will run the partition threads concurrently.
To enable partition
- Set the partition points in the mapping
- Set the number of partitions
- Set the partition types
Partition points marks the boundaries that divide the pipeline into stages. By default Integration Service will keep partition at various transformations. Partition points mark the points in the pipeline where the Integration Service can redistribute data across partitions.
When we give Partition points it actually creates various stage pipelines inside the mapping. Now we need to decide how many partitions we need to have inside one stage pipeline.
When we add more partitions it increases the performance of Informatica. You can define up to 64 partitions at any partition point in a pipeline. Also note that if we define two Partitions at any partition point, then the remaining partition points will also have two partitions.
Deployment group is a global object which consists of objects from one or more folders. Deployment group is used to copy objects from one or more folders to another folder or repository. You can create, edit, or delete deployment groups. You can copy a deployment group and the objects in the deployment group to a target repository.
- Deployment Group Tasks:
You can do the following tasks when working with deployment groups: Create deployment group: you can create a global object for deploying objects from one or more folders.
- Edit deployment group:
You can modify a deployment group. You can change the static deployment group to dynamic deployment group and vice versa.
- Configure privileges and permissions:
Configure permissions on a deployment group and the privilege to copy a deployment group.
- View the objects in a static or dynamic deployment group:
You can preview the objects that the Repository Service will deploy.
- Add or remove objects in a static deployment group:
You can specify what objects belong to a static deployment group. You can add or remove the objects from
the deployment group.
- Associate a query with a dynamic deployment group:
Assign a query to a deployment to dynamically update the objects that the group contains.
- View the history of a deployment group:
View the history of a deployment group, including the source and target repositories, deployment date, and user who ran the deployment.
- Post-deployment validation:
Validate the objects in the target repository after you copy a deployment group to verify that the objects and dependent objects are valid.
- Roll back a deployment group:
You can roll back a deployment group to purge deployed versions of objects from the target repository.
For passing alternative rows into two targets the flow of the mapping is
SQ-> sequence generator -> Router -> TGT1, TGT2
Router filter condition is ‘(mod (sn0, 2) =0)’ give u the even number rows pass it to one target (TGT1) and the default port of the router gives u odd number rows pass it to TGT2.
User: defined functions are extensions to PowerCenter transformation language. It allows you to create complex expressions using the built-in functions and operators and reuse them in PowerCenter mappings. User-defined functions are two types:
Public: Callable from any user-defined function, transformation expression, link condition expression, or task expression.
Private: Callable from another user-defined function. Create a private function when you want the function to be part of a more complex function. The simple function may not be usable independently of the complex function.
- Creating User Defined Functions
You can create a user-defined function from Informatica PowerCenter Designer tool. You can invoke the user-defined function dialog box from main menu Tools > User-Defined Functions > New. If you want a user defined function to convert the phone number into (XXX) XXX-XXXX format.
We can define the function here to convert PHONE_NUMBER into required string formatted in (XXX) XXX-XXXX. The expression editor shows the logic used within the user defined function. User defined functions can be used in any expressions just like any other functions. You can choose the available user-defined functions from the functions explorer.
The types of pushdown optimization:
Source-side pushdown optimization: The Integration Service pushes as much transformation logic as possible to the source database.
Target-side pushdown optimization: The Integration Service pushes as much transformation logic as possible to the target database.
Full pushdown optimization: The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization.
Running Source-Side Pushdown Optimization Sessions: When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database. The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. Then, it reads the results of this SQL query and processes the remaining transformations.
Running Target-Side Pushdown Optimization Sessions: When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the target database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQL on the target database.
Running Full Pushdown Optimization Sessions: To use full pushdown optimization, the source and target databases must be in the same relational database management system. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. It generates and executes SQL statements against the source or target based on the transformation logic it can push to the database.
When you run a session with large quantities of data and full pushdown optimization, the database server must run a long transaction. Consider the following database performance issues when you generate a long transaction:
A long transaction uses more database resources.
A long transaction locks the database for longer periods of time. This reduces database concurrency and increases the likelihood of deadlock.
A long transaction increases the likelihood of an unexpected event.
To minimize database performance issues for long transactions, consider using source-side or target-side pushdown optimization.
Integration Service Behavior with Full Optimization: When you configure a session for full optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. If the Integration Service cannot push all transformation logic to the target database, it tries to push all transformation logic to the source database. If it cannot push all transformation logic to the source or target, the Integration Service pushes as much transformation logic to the source database, processes intermediate transformations that it cannot push to any database, and then pushes the remaining transformation logic to the target database. The Integration Service generates and executes an INSERT SELECT, DELETE, or UPDATE statement for each database to which it pushes transformation logic.
The Rank transformation cannot be pushed to the source or target database. If you configure the session for full pushdown optimization, the Integration Service pushes the Source Qualifier transformation and the Aggregator transformation to the source, processes the Rank transformation, and pushes the Expression transformation and target to the target database. The Integration Service does not fail the session if it can push only part of the transformation logic to the database.
Pmcmd command in Informatica is used to run a workflow in command prompt.
A fact table is a centralized table in data modeling which has foreign keys and measures. The fact table is surrounded by multiple dimension table. so we have to get primary keys of all dimension tables and the measures to load into fact.
Stop: will stop the task Abort: will force stop if the task does not stop by stop process.
If the lookup table is on the same database as the source table in your mapping and caching is not feasible, join the tables in the source database rather than using a Lookup transformation.
If you use a Lookup transformation, perform the following tasks to increase performance:
• Use the optimal database driver.
• Cache lookup tables.
• Optimize the lookup condition. (<,>,<=,>=,!=)
• Filter lookup rows.
• Index the lookup table.
• Optimize multiple lookups.
Using Optimal Database Drivers:
The Integration Service can connect to a lookup table using a native database driver or an ODBC driver. Native database drivers provide better session performance than ODBC drivers. Caching Lookup Tables: If a mapping contains Lookup transformations, you might want to enable lookup caching. When you enable caching, the Integration Service caches the lookup table and queries the lookup cache during the session. When this option is not enabled, the Integration Service queries the lookup table on a row-by-row basis.
The result of the Lookup query and processing is the same, whether or not you cache the lookup table. However, using a lookup cache can increase session performance for smaller lookup tables. In general, you want to cache lookup tables that need less than 300 MB.
Complete the following tasks to further enhance performance for Lookup transformations:
• Use the appropriate cache type.
• Enable concurrent caches.
• Optimize Lookup condition matching.
• Reduce the number of cached rows.
• Override the ORDER BY statement.
• Use a machine with more memory.
Optimizing the Lookup Condition If you include more than one lookup condition, place the conditions in the following order to optimize lookup performance.
Filtering Lookup Rows
Create a filter condition to reduce the number of lookup rows retrieved from the source when the lookup cache is built.
Indexing the Lookup Table
The Integration Service needs to query, sort, and compare values in the lookup condition columns. The index needs to include every column used in a lookup condition.
You can improve performance for the following types of lookups:
Cached lookups: To improve performance, index the columns in the lookup ORDER BY statement. The session log contains the ORDER BY statement.
Uncached lookups: To improve performance, index the columns in the lookup condition. The Integration Service issues a SELECT statement for each row that passes into the Lookup transformation.
Optimizing Multiple Lookups
If a mapping contains multiple lookups, even with caching enabled and enough heap memory, the lookups can slow performance. Tune the Lookup transformations that query the largest amounts of data to improve overall performance.
To determine which Lookup transformations process the most data, examine the Lookup rows in lookup cache counters for each Lookup transformation. The Lookup transformations that have a large number in this counter might benefit from tuning their lookup expressions. If those expressions can be optimized, session performance improves.
You can alter or override the default query in the mapping by entering SQL override in the Properties settings in the Sources qualifier. You can enter any SQL statement supported by the source database. Overriding the SQL Query – You can override the SQL query for a relational source.
To override the default query for a relational source:
- In the Workflow Manager, open the session properties.
- Click the Mapping tab and open the Transformations view.
- Click the Sources node and open the Properties settings.
- Click the Open button in the SQL Query field to open the SQL Editor.
- Enter the SQL override.
In the workflow manager, you can specify constraint-based loading for a session. When you select this option, the integration service orders the target load on a row-by-row basis. For every row generated by an active source, the integration service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:
Active source: Related target tables must have the same active source.
Key relationships: Target tables must have key relationships.
Target connection groups: Targets must be in one target connection group.
Treat rows as an insert- Use this option when you insert into the target. You cannot use updates with constraint-based loading.
The first time you run an incremental aggregation session, the Integration Service processes the source. At the end of the session, the Integration Service stores the aggregated data in two cache files, the index, and data cache files. The Integration Service saves the cache files in the cache file directory. The next time you run the session, the Integration Service aggregates the new rows with the cached aggregated values in the cache files.
When you run a session with an incremental Aggregator transformation, the Integration Service creates a backup of the Aggregator cache files in $PMCacheDir at the beginning of a session run. The Integration Service promotes the backup cache to the initial cache at the beginning of a session recovery run. The Integration Service cannot restore the backup cache file if the session aborts.
When you create multiple partitions in a session that uses incremental aggregation, the Integration Service creates one set of cache files for each partition.
The goal of performance tuning is optimized session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following:
- The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid work connections.
- Flat files: If u’r flat files stored on a machine other than the Informatica server, move those files to the machine that consists of Informatica server.
- Relational data sources: Minimize the connections to sources, targets and Informatica server to improve session performance. Moving target database into server system may improve session performance.
- Staging areas: If you use staging areas u force Informatica server to perform multiple data passes. Removing of staging areas may improve session performance.
- You can run the multiple Informatica servers against the same repository.
- Distributing the session load to multiple Informatica servers may improve session performance.
- Run the Informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. The Unicode mode takes 2 bytes to store a character.
- If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
- We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manager, choose server configure database connections.
- If your target consists key constraints and indexes u slow the loading of data. To improve the session performance in this case drop constraints and indexes before you run the session and rebuild them after completion of the session.
- Running parallel sessions by using concurrent batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance.
- Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipelines.
- In some cases if a session contains a aggregation transformation, you can use incremental aggregation to improve session performance.
- Avoid transformation errors to improve the session performance.
- If the session contains lookup transformation you can improve the session performance by enabling the look up cache.
- If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier.
- Aggregator, Rank and joiner transformation may often decrease the session performance.Because they must group data before processing it. To improve session performance in this case use sorted ports option.