logo
down
shadow

How to read index from hdfs in Lucene


How to read index from hdfs in Lucene

By : Jesper Petersen
Date : November 19 2020, 01:01 AM
I think the issue was by ths following , Lucene does not support HDFS out of the box.
You should be using HdfsDirectory or something like that, standard DirectoryReader simply won't work.
code :


Share : facebook icon twitter icon
Can you read from a lucene index while updating the index

Can you read from a lucene index while updating the index


By : user1404729
Date : March 29 2020, 07:55 AM
help you fix your problem It's been a while since I used Lucene. However, assuming you are talking about the Java version, the FAQ has this to say:
Does Lucene allow searching and indexing simultaneously?
How to read a Lucene index?

How to read a Lucene index?


By : user1794172
Date : March 29 2020, 07:55 AM
Any of those help what you need to look for is how to use IndexReader class, the .terms() method will give you back all the terms in the index.
opening lucene index stored in hdfs

opening lucene index stored in hdfs


By : tyguy609
Date : March 29 2020, 07:55 AM
should help you out If you want to open a Lucene index that's stored in HDFS for the purpose of searching, you're out of luck. AFAIK, there is no implementation of Directory for HDFS that allows for search operations. One reason this is the case is because HDFS is optimized for sequential reads of large blocks, not small, random reads which Lucene incurs.
In the Nutch project, there is an implementation of HDFSDirectory which you can use to create an IndexReader, but only delete operations work. Nutch only uses HDFSDirectory to perform document deduplication.
How do you read the index in Lucene to do a search?

How do you read the index in Lucene to do a search?


By : Neal Wallace
Date : March 29 2020, 07:55 AM
it should still fix some issue I usually use this code... it's a class that encapsulates all the operations with the LuceneIndex (v4)
It uses near-real-time access to the index so nearly all updates are available to the index reader:
code :
@Slf4j
public class LuceneIndex {
/////////////////////////////////////////////////////////////////////////////////////////
//  STATUS (ver http://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html)
/////////////////////////////////////////////////////////////////////////////////////////
    private final IndexWriter _indexWriter;
    private final TrackingIndexWriter _trackingIndexWriter;
    private final NRTManager _searchManager;

    LuceneNRTReopenThread _reopenThread = null;
    private long _reopenToken;  // index update/delete methods returned token

/////////////////////////////////////////////////////////////////////////////////////////
//  CONSTRUCTOR
/////////////////////////////////////////////////////////////////////////////////////////
    /**
     * Constructor en base a una instancia del tipo responsable de la persistencia del índice de lucene
     */
    public LuceneIndex(final Directory luceneDirectory,
                       final Analyzer analyzer) {
        try {
            // Create the indexWriter
            _indexWriter = new IndexWriter(luceneDirectory,
                                           new IndexWriterConfig(LuceneConstants.VERSION,
                                                                 analyzer));
            _trackingIndexWriter = new NRTManager.TrackingIndexWriter(_indexWriter);
            // Create the SearchManager to exec the search
            _searchManager = new NRTManager(_trackingIndexWriter,
                                            new SearcherFactory(),
                                            true);

            // Open the thread in charge of re-open the index to allow it to see real-time changes
            //      The index is refreshed every 60sc when nobody is waiting 
            //      and every 100 millis whenever is someone waiting (see search method)
            // (see http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/NRTManagerReopenThread.html)
            _reopenThread = new LuceneNRTReopenThread(_searchManager,
                                                      60.0,     // when there is nobody waiting
                                                      0.1);     // when there is someone waiting
            _reopenThread.startReopening();

        } catch (IOException ioEx) {
//          if (luceneDirectory instanceof JdbcDirectory) {
//              throw new IllegalStateException("The BBDD table for the lucene index could not be created: " + ioEx.getMessage(),ioEx); 
//          } else {
                throw new IllegalStateException("Lucene index could not be created: " + ioEx.getMessage());
//          }
        }
    }
/////////////////////////////////////////////////////////////////////////////////////////
//  FINALIZADOR
/////////////////////////////////////////////////////////////////////////////////////////
    @Override
    protected void finalize() throws Throwable {
        this.close();
        super.finalize();
    }
    /**
     * Closes every index
     */
    public void close() {
        try {
            // stop the index reader re-open thread
            _reopenThread.stopReopening();
            _reopenThread.interrupt();

            // Close the search manager
            _searchManager.close();

            // Close the indexWriter, commiting everithing that's pending
            _indexWriter.commit();
            _indexWriter.close();

        } catch(IOException ioEx) {
            log.error("Error while closing lucene index: {}",ioEx.getMessage(),
                                                             ioEx);
        }
    }
/////////////////////////////////////////////////////////////////////////////////////////
//  REOPEN-THREAD: Thread in charge of re-open the IndexReader to have access to the 
//                 latest IndexWriter changes
/////////////////////////////////////////////////////////////////////////////////////////
    private class LuceneNRTReopenThread
          extends NRTManagerReopenThread {

        volatile boolean _finished = false;

        public LuceneNRTReopenThread(final NRTManager manager,
                                     final double targetMaxStaleSec,final double targetMinStaleSec) {
            super(manager, targetMaxStaleSec, targetMinStaleSec);
            this.setName("NRT Reopen Thread");
            this.setPriority(Math.min(Thread.currentThread().getPriority()+2, 
                                      Thread.MAX_PRIORITY));
            this.setDaemon(true);
        }
        public synchronized  void startReopening() {
            _finished = false;
            this.start();
        }
        public synchronized void stopReopening() {
            _finished = true;
        }
        @Override
        public void run() {
            while (!_finished) {
                super.run();
            }
        }
    }
/////////////////////////////////////////////////////////////////////////////////////////
//  
/////////////////////////////////////////////////////////////////////////////////////////
    /**
     * Index a Lucene document
     * @param doc the document to be indexed
     */
    public void index(final Document doc) { 
        // Indexar en lucene
        try {
            _reopenToken = _trackingIndexWriter.addDocument(doc);
            log.debug("document indexed in lucene");
        } catch(IOException ioEx) {
            log.error("Error while in Lucene index operation: {}",ioEx.getMessage(),
                                                                  ioEx);
        } finally {
            try {
                _indexWriter.commit();
            } catch (IOException ioEx) {
                log.error("Error while commiting changes to Lucene index: {}",ioEx.getMessage(),
                                                                              ioEx);
            }
        }
    }
    /**
     * Updates the index info for a lucene document
     * @param doc the document to be indexed
     */
    public void reIndex(final Term recordIdTerm,
                        final Document doc) {   
        // Indexar en lucene
        try {
            _reopenToken = _trackingIndexWriter.updateDocument(recordIdTerm, 
                                                               doc);
            log.debug("{} document re-indexed in lucene",recordIdTerm.text());
        } catch(IOException ioEx) {
            log.error("Error in lucene re-indexing operation: {}",ioEx.getMessage(),
                                                                  ioEx);
        } finally {
            try {
                _indexWriter.commit();
            } catch (IOException ioEx) {
                log.error("Error while commiting changes to Lucene index: {}",ioEx.getMessage(),
                                                                              ioEx);
            }
        }
    }
    /**
     * Unindex a lucene document
     * @param idTerm term used to locate the document to be unindexed
     *               IMPORTANT! the term must filter only the document and only the document
     *                          otherwise all matching docs will be unindexed
     */
    public void unIndex(final Term idTerm) {
        try {
            _reopenToken = _trackingIndexWriter.deleteDocuments(idTerm);
            log.debug("{}={} term matching records un-indexed from lucene",idTerm.field(),
                                                                           idTerm.text());
        } catch(IOException ioEx) {
            log.error("Error in un-index lucene operation: {}",ioEx.getMessage(),
                                                               ioEx);           
        } finally {
            try {
                _indexWriter.commit(); 
            } catch (IOException ioEx) {
                log.error("Error while commiting changes to Lucene index: {}",ioEx.getMessage(),
                                                                              ioEx);
            }
        }
    }
    /**
     * Delete all lucene index docs
     */
    public void truncate() {
        try {
            _reopenToken = _trackingIndexWriter.deleteAll();
            log.warn("lucene index truncated");
        } catch(IOException ioEx) {
            log.error("Error truncating lucene index: {}",ioEx.getMessage(),
                                                          ioEx);            
        } finally {
            try {
                _indexWriter.commit(); 
            } catch (IOException ioEx) {
                log.error("Error truncating lucene index: {}",ioEx.getMessage(),
                                                              ioEx);
            }
        }
    }
/////////////////////////////////////////////////////////////////////////////////////////
//  COUNT-SEARCH
/////////////////////////////////////////////////////////////////////////////////////////
    /**
     * Count the number of results returned by a search against the lucene index
     * @param qry the query
     * @return
     */
    public long count(final Query qry) {
        long outCount = 0;
        try {
            _searchManager.waitForGeneration(_reopenToken);     // wait untill the index is re-opened
            IndexSearcher searcher = _searchManager.acquire();
            try {
                TopDocs docs = searcher.search(qry,0);
                if (docs != null) outCount = docs.totalHits;
                log.debug("count-search executed against lucene index returning {}",outCount);
            } finally {
                _searchManager.release(searcher);
            }
        } catch (IOException ioEx) {
            log.error("Error re-opening the index {}",ioEx.getMessage(),
                                                      ioEx);
        }
        return outCount;
    }
    /**
     * Executes a search query
     * @param qry the query to be executed
     * @param sortFields the search query criteria
     * @param firstResultItemOrder the order number of the first element to be returned
     * @param numberOfResults number of results to be returnee
     * @return a page of search results
     */
    public LucenePageResults search(final Query qry,Set<SortField> sortFields,
                                    final int firstResultItemOrder,final int numberOfResults) {
        LucenePageResults outDocs = null;
        try {
            _searchManager.waitForGeneration(_reopenToken); // wait until the index is re-opened for the last update
            IndexSearcher searcher = _searchManager.acquire();
            try {
                // sort crieteria
                SortField[] theSortFields = null;
                if (CollectionUtils.hasData(sortFields)) theSortFields = CollectionUtils.toArray(sortFields,SortField.class);
                Sort theSort = CollectionUtils.hasData(theSortFields) ? new Sort(theSortFields)
                                                                      : null;
                // number of results to be returned
                int theNumberOfResults = firstResultItemOrder + numberOfResults;

                // Exec the search (if the sort criteria is null, they're not used)
                TopDocs scoredDocs = theSort != null ? searcher.search(qry,
                                                                       theNumberOfResults,
                                                                       theSort)
                                                     : searcher.search(qry,
                                                                       theNumberOfResults);
                log.debug("query {} {} executed against lucene index: returned {} total items, {} in this page",qry.toString(),
                                                                                                                (theSort != null ? theSort.toString() : ""),
                                                                                                                scoredDocs != null ? scoredDocs.totalHits : 0,
                                                                                                                scoredDocs != null ? scoredDocs.scoreDocs.length : 0);
                outDocs = LucenePageResults.create(searcher,
                                                   scoredDocs,
                                                   firstResultItemOrder,numberOfResults);
            } finally {
                _searchManager.release(searcher);
            }
        } catch (IOException ioEx) {
            log.error("Error freeing the searcher {}",ioEx.getMessage(),
                                                      ioEx);
        }
        return outDocs;
    }
/////////////////////////////////////////////////////////////////////////////////////////
//  INDEX MAINTEINANCE
/////////////////////////////////////////////////////////////////////////////////////////
    /**
     * Mergest the lucene index segments into one
     * (this should NOT be used, only rarely for index mainteinance)
     */
    public void optimize() {
        try {
            _indexWriter.forceMerge(1);
            log.debug("Lucene index merged into one segment");
        } catch (IOException ioEx) {
            log.error("Error optimizing lucene index {}",ioEx.getMessage(),
                                                         ioEx);
        }
    }
}
How to read lucene 4.0 index with java?

How to read lucene 4.0 index with java?


By : MuteString
Date : March 29 2020, 07:55 AM
I wish this help you My bet is that you need to use http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/AtomicReader.html, it have all needed methods.
Also take a look here - http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/IndexReader.html - it explain you how to create needed readers.
Related Posts Related Posts :
  • Why onNext has no effect
  • IntelliJ Idea 2018.2 no option for importing gradle project
  • Remove a negative element from an array w/o array.copy
  • Could this prevent overflow in Java?
  • In Java Swing, can I receive Caret events in real time?
  • I need help not using brute force when using JFrames and DrawWindows
  • How do I refactor ArrayList<MyClass> into MyClassLIst?
  • My program keeps saying that the method cannot be resolved
  • BubbleSort -my code returns random addresses
  • JavaFX components inside HTML?
  • Character in Java game not responding
  • Working around access denied in a FileWalking Tree in Java7
  • How to avoid if/else when using multple suppliers?
  • How to pass object from table cell editor to Table Model?
  • Paypal Sandbox payment state pending
  • sejda-console.bat passing parameters with spaces
  • Multiple Consumers Spring Kafka
  • Bitbucket pipeline: environmental variables in build.gradle file not recognized
  • Java static enum method to return a default enum value
  • What is the complexity of empty for loop?
  • Group randomly List of Lists as N subsets with K elements in each
  • How to set texture on a shape drawn by glDrawArrays()?
  • How to make a jlabel resizable according to the window
  • Java Mathematical Expression Syntax
  • assigning values to characters in java
  • Method overriding and Inheritance in java
  • given a set of lists/groups with a series of number, find matching numbers
  • Connection to HID USB device (keyboard and mouse) in android
  • Inject HttpServletRequest in CDI SessionScoped bean
  • Pool game in java - ball collision algorithm
  • Focus ScrollView to selected position programmatically - Android
  • Is it hibernate bug?
  • Glib memory allocation error
  • Android Proguard - step by step
  • Setting the pivot point of a JPanel to its center
  • java check time is greater time
  • how to save a screenshot (matlab)
  • Anti-aliasing filled shapes in libgdx
  • Gwt custom text box having baloon popup
  • How to change date dynamically for each element in an Array
  • JPanel won't add the JLabel text?
  • how to access the .properties file in ant if it is in different location
  • Reduce application memory footprint
  • java.lang.ClassCastException: javax.mail.Session cannot be cast to javax.mail.Session
  • What is the exact purpose of calling System.exit() in java
  • How to do you get output from Javascript into Java using ScriptEngine
  • Java replaceAll() method to escape special characters
  • Java (Removing a 'keyword' from the alphabet)
  • Format BigDecimal in Spring
  • iterating checkbox values stored in array using EL
  • Java - Jackcess API with .accde(MS Access) format
  • Alternative for some of Struts 1 methods in Struts 2
  • javax.crypto JDK source code, again
  • Spring Roo - Command 'service' not found
  • antlr4: ATN version 2 expected 3
  • Deploy GlassFish using Netbeans generate java.net.MalformedURLException: Bad URL path
  • why is wait() called for loop in Thread.join() of Java?
  • Java JAXB marshall into DOM Document
  • Why JDBI 3 @ColumnName annotation doesn't work?
  • Java SimpleDateformatter with 10 decimals after the seconds, cannot convert to Date
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org